If I overwrite a layer in the pretrained model the training is fine, but if I remove the final layer and then re-append the same layer - the training fails.
I’m starting here: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
Here’s my final code:
import torch
import torch.nn as nn
import torchvision.models as models
from torch.autograd import Variable
# Pretrained resnet152 model till second last layer as feature extraction.
resnet152 = models.resnet152(pretrained=True)
last_layer = list(resnet152.children())[-1]
modules=list(resnet152.children())[:-1]
resnet152=nn.Sequential(*modules)
for p in resnet152.parameters():
p.requires_grad = False
resnet152.fc = last_layer
optimizer_conv = optim.SGD(resnet152.fc.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=5, gamma=0.1)
model_ft = train_model(resnet152, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=10)
This will fail with:
RuntimeError: size mismatch, m1: [8192 x 1], m2: [2048 x 1000] at ../aten/src/TH/generic/THTensorMath.cpp:961
But if I do the same thing but instead overwrite the .fc layer:
resnet152 = models.resnet152(pretrained=True)
for p in resnet152.parameters():
p.requires_grad = False
num_ftrs = resnet152.fc.in_features
resnet152.fc = nn.Linear(num_ftrs, 1000)
optimizer_conv = optim.SGD(resnet152.fc.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=5, gamma=0.1)
model_ft = train_model(resnet152, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=10)
This will train fine.
Even if I check they are equivalent it passes:
resnet152_2 = models.resnet152(pretrained=True)
for param, param2 in zip(resnet152.parameters(), resnet152_2.parameters()):
assert torch.equal(param[0], param2[0])