Can anyone tell me how to do finetuning in pytorch? Suppose, I have loaded the Resnet 18 pretrained model. Now I want to finetune it on my own dataset which contain say 10 classes. How to remove the last output layer and change to as per my requirement?
You can find an example at the bottom of this section of autograd mechanics notes.
Thanks for your reply.
I have another doubt here. While fine-tuning, should we not use a smaller learning rate for the whole network (http://cs231n.github.io/transfer-learning/)? Here the example says us to fix the network and only learn the weights in the newly added layers? Which one in recommended?
In the snippet I’ve sent you only the last layer is optimized. The rest will be frozen (base parameters will receive no gradients - they have requires_grad=False
). This is as if you used learning rate of 0 for that part. If you want to use a lower lr for the base see this section of the optim docs.
optimizer = torch.optim.SGD([
{'params': model.conv1.parameters()},
{'params': model.bn1.parameters()},
{'params': model.relu.parameters()},
{'params': model.maxpool.parameters()},
{'params': model.layer1.parameters()},
{'params': model.layer2.parameters()},
{'params': model.layer3.parameters()},
{'params': model.layer4.parameters()},
{'params': model.avgpool.parameters()},
{'params': model.fc.parameters(), 'lr': opt.lr}
], lr=opt.lr*0.1, momentum=0.9)
Is this the correct way of defining different learning rate to different layers of ResNet18? Or is there any other optimized way to do that?
This should do it:
ignored_params = list(map(id, model.fc.parameters()))
base_params = filter(lambda p: id(p) not in ignored_params,
model.parameters())
optimizer = torch.optim.SGD([
{'params': base_params},
{'params': model.fc.parameters(), 'lr': opt.lr}
], lr=opt.lr*0.1, momentum=0.9)
Great! Thanks
This may also help to learn how to modify layers without changing other layers’ parameters and construct a new model.
model = models.vgg16(pretrained=True)
print list(list(model.classifier.children())[1].parameters())
mod = list(model.classifier.children())
mod.pop()
mod.append(torch.nn.Linear(4096, 2))
new_classifier = torch.nn.Sequential(*mod)
print list(list(new_classifier.children())[1].parameters())
model.classifier = new_classifier
Also, you may consider to add pretrained models for VGG, which you may found here:
As for finetuning resnet, it is more easy:
model = models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(2048, 2)
How do I add new layers to existing pretrained models? Here, the last layer by name is replaced with a Linear layer. How to add another layer after that?
class MyModel(nn.Module):
def __init__(self, pretrained_model):
self.pretrained_model = pretrained_model
self.last_layer = ... # create layer
def forward(self, x):
return self.last_layer(self.pretrained_model(x))
pretrained_model = torchvision.models.resnet18(pretrained=True)
model = MyModel(pretrained_model)
Thank you for the help.
Another question, I am not able to see classifier attribute when loading a pretrained ResNet-18. However, I can directly access the children attribute without needing classifier. Is the classifier attribute part of old version of PyTorch or something I am missing?
ResNets don’t have a classifier
attribute. They use only a single fully-connected layer and it’s available as their fc
attribute.
@zhoubinxyz may you add a more complete example to illustrate how to fine-tuning VGG model on a custom dataset using PyTorch?
You can get the basic idea form this PR.
There is an option named --pretrained
in the imagenet main.py
file. May I ask that if I use this option with a custom dataset like these:
python main.py --arch=alexnet --pretrained my_custom_dataset
What will happen with this command? It seems that like a fine-tuning.
I think --pretrained
is meant for evaluation mode. The script doesn’t support finetuning at the moment.
@apaszke I reference this PR for fine-tuning. For alexnet and vggnet, the original code replay all the fully-connected layers. May I ask:
- how can I only replace the last fully-connected layer for fine-tuning and freeze other fully-connected layers?
- Is the
forward
the right way to code? Because you give some reference code above:
def forward(self, x):
return self.last_layer(self.pretrained_model(x))
Original fine-tuing code:
class FineTuneModel(nn.Module):
def __init__(self, original_model, arch, num_classes):
super(FineTuneModel, self).__init__()
if arch.startswith('alexnet') :
self.features = original_model.features
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
self.modelName = 'alexnet'
elif arch.startswith('resnet') :
# Everything except the last linear layer
self.features = nn.Sequential(*list(original_model.children())[:-1])
self.classifier = nn.Sequential(
nn.Linear(512, num_classes)
)
self.modelName = 'resnet'
elif arch.startswith('vgg16'):
self.features = original_model.features
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(25088, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
self.modelName = 'vgg16'
else :
raise("Finetuning not supported on this architecture yet")
# Freeze those weights
for p in self.features.parameters():
p.requires_grad = False
def forward(self, x):
f = self.features(x)
if self.modelName == 'alexnet' :
f = f.view(f.size(0), 256 * 6 * 6)
elif self.modelName == 'vgg16':
f = f.view(f.size(0), -1)
elif self.modelName == 'resnet' :
f = f.view(f.size(0), -1)
y = self.classifier(f)
return y
This post should help you.
I added following lines to imagenet example, using pretrained model of resnet18.
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = torch.nn.Linear(512, 3)
optimizer = torch.optim.SGD(model.fc.parameters(), args.lr,
momentum=args.momentum,
weight_decay=args.weight_decay)
But then I have following error:
File "main.py", line 234, in train
loss.backward()
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 146, in backward
self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: there are no graph nodes that require computing gradients
I would like to freeze all parameters of original ResNet18 and just learn the last layer with 3 classes. How I should do this correctly? Based on information from the forum, this should we the working version.