Resnet custom head size mismatch error

I’m loading the resnet 50 from torch hub, cutting off the fc layers (last two nn modules), keeping the base, and creating a custom head on it thats the exact same head as the original resnet (created same head to debug size mismatch error).

But I am still facing the size mismatch error. I do not understand how, because when I print out the layers, it is exactly the same.

I’ve attached code to reproduce result

import torch
x=torch.randn(([2, 3, 50, 50]))

model = torch.hub.load('pytorch/vision:v0.5.0', 'resnet50', pretrained=True)
print(model)

# feature extraction layer 
base_model_list = list(model.children())[:-2]
# same layers as loaded resnet 
layers= base_model_list + list(model.children())[-2:]
new_model = nn.Sequential(*layers)
print(new_model)

model(x) # executes with no error 
new_model(x) # runtime error: size mismatch, m1: [4096 x 1], m2: [2048 x 1000]

Anyone know how? Is there a hidden operation that’s getting deleted when I separate the base? Because the layers are the same when printed

You are missing the flatten operation from this line of code, which is applied using the functional API.
This code should work:

# feature extraction layer 
base_model_list = list(model.children())[:-2]
# same layers as loaded resnet 
layers= base_model_list + list(model.children())[-2:-1] + [nn.Flatten()] + list(model.children())[-1:]
new_model = nn.Sequential(*layers)

Generally nn.Sequential is used for very simple models, so wrapping all child modules inside this container might not always work out of the box.

1 Like

I read the above comment after seeing a lot of people suggest on the web to do transfer learning with Sequential (*model.children()) approach. It was still unclear to me whether that approach is the correct one for only finetuning the last layer. After seeing https://github.com/pytorch/pytorch/issues/15129 it made sense that this is not the correct approach, you cannot express resnet by only its .children(). I just wanted to call that out explicitly to save some time. I ended up just mutating the last layer with resnet.fc = nn.Linear(num_ftrs, my_num_classes) .