I am trying to use Resnet50 for regression

Hi,
I am trying to use pretrained Resnet50 for regression task
I changed the output of the fc to 1
model = torchvision.models.resnet50(pretrained=True)
model.fc.out_features = 1

and I am using the MSELoss as loss function and my batch_size = 15
criterion = nn.MSELoss()
preds = network(images)
loss = criterion(preds, labels)

I am getting the following error:

return torch._C._VariableFunctions.broadcast_tensors(tensors)

RuntimeError: The size of tensor a (15000) must match the size of tensor b (15) at non-singleton dimension 0

You need to replace the entire fc layer, likely something like
model.fc = nn.Linear(2048, 1)
should be what you want.

1 Like

Thanks a lot
I tried this solution before, but I think it did not work because I had not restarted the kernel

Now it is working

I have another problem
when I have used the same notwork as classification for 8 classes and my batch_size = 15
but when I used it for regression output = 1 with even batch_size = 8
it tells

training, momentum, eps, torch.backends.cudnn.enabled

RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 6.00 GiB total capacity; 4.43 GiB already allocated; 1.14 MiB free; 161.30 MiB cached)

what do you think cause this issue ?

Try to restart the kernel again before loading the model.

Yes, I did that
but did not solve this issue

Maybe your training data is huge. I mean, each sample of your training data is huge. By the way, could you share the nature of the input (I am assuming images) and labels that you are feeding to the model? Is it the same for both classification and regression?

yes
they are the same, the same data I feed to the network input images (3,224,224)
the only difference is the labels
in the first case I replaced the regression 0-8 by classes 0-1, 1-2, …
and in the second case I used the true regression value

Could you check if a smaller bath_size works in this case? If that works, you can accumulate the gradients and run the weight update every n batches.

In the whole training process:
In the training phase, it works for 209 iterations it finished without any problem
the running was stopped in the validation evaluation (validation has 1100 samples)
after 4% the running was interrupted (in the validation process I used batch_size = 1)

Is that make any sense !!!

Sure. Can you just check one more thing - see if between the training and validation phase, you are moving the model to and from gpu to cpu.

And, if possible try to delete the batches at the end of training and validation like del images; del labels

thanks bro,
I have not used move model to gpu than cpu between training and validation

I solved the problem first by adding del images; del labels
and I added in the validation phase
model.eval()
with torch.no_grad():
preds = model(images)

previously, it was

model.eval()
preds = model(images)

Ah! I see. So you were keeping the gradients as well.

I thought it would be enough because no
loss.backward()
optimizer.step()
in the validation phase