Editing layers of pretrained model make finetuning fail

jchaykow · July 10, 2019, 1:45am

If I overwrite a layer in the pretrained model the training is fine, but if I remove the final layer and then re-append the same layer - the training fails.

I’m starting here: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

Here’s my final code:

import torch
import torch.nn as nn
import torchvision.models as models
from torch.autograd import Variable

# Pretrained resnet152 model till second last layer as feature extraction.
resnet152 = models.resnet152(pretrained=True)

last_layer = list(resnet152.children())[-1]
modules=list(resnet152.children())[:-1]
resnet152=nn.Sequential(*modules)
for p in resnet152.parameters():
    p.requires_grad = False

resnet152.fc = last_layer

optimizer_conv = optim.SGD(resnet152.fc.parameters(), lr=0.001, momentum=0.9)

exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=5, gamma=0.1)

model_ft = train_model(resnet152, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=10)

This will fail with:

RuntimeError: size mismatch, m1: [8192 x 1], m2: [2048 x 1000] at ../aten/src/TH/generic/THTensorMath.cpp:961

But if I do the same thing but instead overwrite the .fc layer:

resnet152 = models.resnet152(pretrained=True)

for p in resnet152.parameters():
    p.requires_grad = False

num_ftrs = resnet152.fc.in_features
resnet152.fc = nn.Linear(num_ftrs, 1000)

optimizer_conv = optim.SGD(resnet152.fc.parameters(), lr=0.001, momentum=0.9)

exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=5, gamma=0.1)

model_ft = train_model(resnet152, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=10)

This will train fine.

Even if I check they are equivalent it passes:

resnet152_2 = models.resnet152(pretrained=True)

for param, param2 in zip(resnet152.parameters(), resnet152_2.parameters()):
    assert torch.equal(param[0], param2[0])

ptrblck · July 10, 2019, 10:09am

If you wrap all modules in an nn.Sequential module, you are not using the original forward method anymore, and thus are missing this reshape operation.

You could add a custom Flatten layer or write your own forward method using a custom nn.Module:

class Flatten(nn.Module):
    def __init__(self):
        super(Flatten, self).__init__()
        
    def forward(self, x):
        x = x.view(x.size(0), -1)
        return x

last_layer = list(resnet152.children())[-1]
modules=list(resnet152.children())[:-1]
resnet152=nn.Sequential(*modules)
for p in resnet152.parameters():
    p.requires_grad = False

resnet152.fc = nn.Sequential(
    Flatten(),
    last_layer
)

x = torch.randn(2, 3, 224, 224)
output = resnet152(x)

jchaykow · July 10, 2019, 5:42pm

Thank you! So I assume if I had a F.normalize(x, p=2, dim=1) in my forward pass I would just replace with:

class normalize(nn.Module):
    def __init__(self):
        super(normalize, self).__init__()
        
    def forward(self, x):
        x = F.normalize(x, p=2, dim=1)
        return x

ptrblck · July 10, 2019, 11:20pm

Yes, all functional calls in the original forward will be lost, if you rewrap the submodules in a new module, so your approach should be correct.