Resnet custom head size mismatch error

sid-ls · February 17, 2020, 8:11am

I’m loading the resnet 50 from torch hub, cutting off the fc layers (last two nn modules), keeping the base, and creating a custom head on it thats the exact same head as the original resnet (created same head to debug size mismatch error).

But I am still facing the size mismatch error. I do not understand how, because when I print out the layers, it is exactly the same.

I’ve attached code to reproduce result

import torch
x=torch.randn(([2, 3, 50, 50]))

model = torch.hub.load('pytorch/vision:v0.5.0', 'resnet50', pretrained=True)
print(model)

# feature extraction layer 
base_model_list = list(model.children())[:-2]
# same layers as loaded resnet 
layers= base_model_list + list(model.children())[-2:]
new_model = nn.Sequential(*layers)
print(new_model)

model(x) # executes with no error 
new_model(x) # runtime error: size mismatch, m1: [4096 x 1], m2: [2048 x 1000]

Anyone know how? Is there a hidden operation that’s getting deleted when I separate the base? Because the layers are the same when printed

ptrblck · February 18, 2020, 3:31am

You are missing the flatten operation from this line of code, which is applied using the functional API.
This code should work:

# feature extraction layer 
base_model_list = list(model.children())[:-2]
# same layers as loaded resnet 
layers= base_model_list + list(model.children())[-2:-1] + [nn.Flatten()] + list(model.children())[-1:]
new_model = nn.Sequential(*layers)

Generally nn.Sequential is used for very simple models, so wrapping all child modules inside this container might not always work out of the box.

swalecka · August 8, 2020, 2:07am

I read the above comment after seeing a lot of people suggest on the web to do transfer learning with Sequential (*model.children()) approach. It was still unclear to me whether that approach is the correct one for only finetuning the last layer. After seeing https://github.com/pytorch/pytorch/issues/15129 it made sense that this is not the correct approach, you cannot express resnet by only its .children(). I just wanted to call that out explicitly to save some time. I ended up just mutating the last layer with resnet.fc = nn.Linear(num_ftrs, my_num_classes) .