I am trying to use Resnet50 for regression

algeriapy · May 14, 2020, 2:11am

Hi,
I am trying to use pretrained Resnet50 for regression task
I changed the output of the fc to 1
model = torchvision.models.resnet50(pretrained=True)
model.fc.out_features = 1

and I am using the MSELoss as loss function and my batch_size = 15
criterion = nn.MSELoss()
preds = network(images)
loss = criterion(preds, labels)

I am getting the following error:

return torch._C._VariableFunctions.broadcast_tensors(tensors)

RuntimeError: The size of tensor a (15000) must match the size of tensor b (15) at non-singleton dimension 0

futscdav · May 14, 2020, 2:31am

You need to replace the entire fc layer, likely something like
model.fc = nn.Linear(2048, 1)
should be what you want.

algeriapy · May 14, 2020, 3:08am

Thanks a lot
I tried this solution before, but I think it did not work because I had not restarted the kernel

Now it is working

algeriapy · May 14, 2020, 7:22pm

I have another problem
when I have used the same notwork as classification for 8 classes and my batch_size = 15
but when I used it for regression output = 1 with even batch_size = 8
it tells

training, momentum, eps, torch.backends.cudnn.enabled

RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 6.00 GiB total capacity; 4.43 GiB already allocated; 1.14 MiB free; 161.30 MiB cached)

what do you think cause this issue ?

Anshumaan_Dash · May 14, 2020, 7:26pm

Try to restart the kernel again before loading the model.

algeriapy · May 14, 2020, 7:34pm

Yes, I did that
but did not solve this issue

Anshumaan_Dash · May 14, 2020, 7:55pm

Maybe your training data is huge. I mean, each sample of your training data is huge. By the way, could you share the nature of the input (I am assuming images) and labels that you are feeding to the model? Is it the same for both classification and regression?

algeriapy · May 14, 2020, 8:00pm

yes
they are the same, the same data I feed to the network input images (3,224,224)
the only difference is the labels
in the first case I replaced the regression 0-8 by classes 0-1, 1-2, …
and in the second case I used the true regression value

Anshumaan_Dash · May 14, 2020, 8:03pm

Could you check if a smaller bath_size works in this case? If that works, you can accumulate the gradients and run the weight update every n batches.

algeriapy · May 14, 2020, 8:08pm

In the whole training process:
In the training phase, it works for 209 iterations it finished without any problem
the running was stopped in the validation evaluation (validation has 1100 samples)
after 4% the running was interrupted (in the validation process I used batch_size = 1)

Is that make any sense !!!

Anshumaan_Dash · May 14, 2020, 8:13pm

Sure. Can you just check one more thing - see if between the training and validation phase, you are moving the model to and from gpu to cpu.

And, if possible try to delete the batches at the end of training and validation like del images; del labels

algeriapy · May 14, 2020, 9:45pm

thanks bro,
I have not used move model to gpu than cpu between training and validation

I solved the problem first by adding del images; del labels
and I added in the validation phase
model.eval()
with torch.no_grad():
preds = model(images)

previously, it was

model.eval()
preds = model(images)

Anshumaan_Dash · May 14, 2020, 9:57pm

Ah! I see. So you were keeping the gradients as well.

algeriapy · May 14, 2020, 11:26pm

I thought it would be enough because no
loss.backward()
optimizer.step()
in the validation phase