How to perform finetuning in Pytorch?

(Avijit Dasgupta) #1

Can anyone tell me how to do finetuning in pytorch? Suppose, I have loaded the Resnet 18 pretrained model. Now I want to finetune it on my own dataset which contain say 10 classes. How to remove the last output layer and change to as per my requirement?

How to modify the final FC layer based on the torch.model
How can l fine tune a pre-trained model?
Fine Tuning a model in Pytorch
(Adam Paszke) #2

You can find an example at the bottom of this section of autograd mechanics notes.

(Avijit Dasgupta) #3

Thanks for your reply.

I have another doubt here. While fine-tuning, should we not use a smaller learning rate for the whole network ( Here the example says us to fix the network and only learn the weights in the newly added layers? Which one in recommended?

(Adam Paszke) #5

In the snippet I’ve sent you only the last layer is optimized. The rest will be frozen (base parameters will receive no gradients - they have requires_grad=False). This is as if you used learning rate of 0 for that part. If you want to use a lower lr for the base see this section of the optim docs.

(Avijit Dasgupta) #6

Thanks :slight_smile: It helped.

(Avijit Dasgupta) #7
optimizer = torch.optim.SGD([
            {'params': model.conv1.parameters()},
            {'params': model.bn1.parameters()},
            {'params': model.relu.parameters()},
            {'params': model.maxpool.parameters()},
            {'params': model.layer1.parameters()},
            {'params': model.layer2.parameters()},
            {'params': model.layer3.parameters()},
            {'params': model.layer4.parameters()},
            {'params': model.avgpool.parameters()},
            {'params': model.fc.parameters(), 'lr':}
        ],*0.1, momentum=0.9)

Is this the correct way of defining different learning rate to different layers of ResNet18? Or is there any other optimized way to do that?

(Adam Paszke) #8

This should do it:

ignored_params = list(map(id, model.fc.parameters()))
base_params = filter(lambda p: id(p) not in ignored_params,

optimizer = torch.optim.SGD([
            {'params': base_params},
            {'params': model.fc.parameters(), 'lr':}
        ],*0.1, momentum=0.9)

Finer control for Freezing layers in resnet
(Avijit Dasgupta) #9

Great! Thanks :smile:

(Bin) #10

This may also help to learn how to modify layers without changing other layers’ parameters and construct a new model.

    model = models.vgg16(pretrained=True)
    print list(list(model.classifier.children())[1].parameters())
    mod = list(model.classifier.children())
    mod.append(torch.nn.Linear(4096, 2))
    new_classifier = torch.nn.Sequential(*mod)
    print list(list(new_classifier.children())[1].parameters())
    model.classifier = new_classifier

Also, you may consider to add pretrained models for VGG, which you may found here:

As for finetuning resnet, it is more easy:

model = models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(2048, 2)

(Saurav Sharma) #11

How do I add new layers to existing pretrained models? Here, the last layer by name is replaced with a Linear layer. How to add another layer after that?

(Adam Paszke) #12
class MyModel(nn.Module):
    def __init__(self, pretrained_model):
        self.pretrained_model = pretrained_model
        self.last_layer = ... # create layer

    def forward(self, x):
        return self.last_layer(self.pretrained_model(x))

pretrained_model = torchvision.models.resnet18(pretrained=True)
model = MyModel(pretrained_model)

Module.children() vs Module.modules()
(Saurav Sharma) #13

Thank you for the help.

Another question, I am not able to see classifier attribute when loading a pretrained ResNet-18. However, I can directly access the children attribute without needing classifier. Is the classifier attribute part of old version of PyTorch or something I am missing?

(Adam Paszke) #14

ResNets don’t have a classifier attribute. They use only a single fully-connected layer and it’s available as their fc attribute.

(Yili Zhao) #15

@zhoubinxyz may you add a more complete example to illustrate how to fine-tuning VGG model on a custom dataset using PyTorch?

(Adam Paszke) #16

You can get the basic idea form this PR.

(Yili Zhao) #17

There is an option named --pretrained in the imagenet file. May I ask that if I use this option with a custom dataset like these:
python --arch=alexnet --pretrained my_custom_dataset
What will happen with this command? It seems that like a fine-tuning.

(Adam Paszke) #18

I think --pretrained is meant for evaluation mode. The script doesn’t support finetuning at the moment.

(Yili Zhao) #19

@apaszke I reference this PR for fine-tuning. For alexnet and vggnet, the original code replay all the fully-connected layers. May I ask:

  • how can I only replace the last fully-connected layer for fine-tuning and freeze other fully-connected layers?
  • Is the forward the right way to code? Because you give some reference code above:

def forward(self, x):
return self.last_layer(self.pretrained_model(x))

Original fine-tuing code:

class FineTuneModel(nn.Module):
    def __init__(self, original_model, arch, num_classes):
        super(FineTuneModel, self).__init__()

        if arch.startswith('alexnet') :
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Linear(256 * 6 * 6, 4096),
                nn.Linear(4096, 4096),
                nn.Linear(4096, num_classes),
            self.modelName = 'alexnet'
        elif arch.startswith('resnet') :
            # Everything except the last linear layer
            self.features = nn.Sequential(*list(original_model.children())[:-1])
            self.classifier = nn.Sequential(
                nn.Linear(512, num_classes)
            self.modelName = 'resnet'
        elif arch.startswith('vgg16'):
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Linear(25088, 4096),
                nn.Linear(4096, 4096),
                nn.Linear(4096, num_classes),
            self.modelName = 'vgg16'
        else :
            raise("Finetuning not supported on this architecture yet")

        # Freeze those weights
        for p in self.features.parameters():
            p.requires_grad = False

    def forward(self, x):
        f = self.features(x)
        if self.modelName == 'alexnet' :
            f = f.view(f.size(0), 256 * 6 * 6)
        elif self.modelName == 'vgg16':
            f = f.view(f.size(0), -1)
        elif self.modelName == 'resnet' :
            f = f.view(f.size(0), -1)
        y = self.classifier(f)
        return y

(Adam Paszke) #20

This post should help you.

(Bartosz Ludwiczuk) #21

I added following lines to imagenet example, using pretrained model of resnet18.

 for param in model.parameters():
      param.requires_grad = False

 # Replace the last fully-connected layer
 # Parameters of newly constructed modules have requires_grad=True by default
 model.fc = torch.nn.Linear(512, 3)

 optimizer = torch.optim.SGD(model.fc.parameters(),,

But then I have following error:

File "", line 234, in train
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: there are no graph nodes that require computing gradients

I would like to freeze all parameters of original ResNet18 and just learn the last layer with 3 classes. How I should do this correctly? Based on information from the forum, this should we the working version.