How to perform finetuning in Pytorch?

(Avijit Dasgupta) #1

Can anyone tell me how to do finetuning in pytorch? Suppose, I have loaded the Resnet 18 pretrained model. Now I want to finetune it on my own dataset which contain say 10 classes. How to remove the last output layer and change to as per my requirement?

(Adam Paszke) #2

You can find an example at the bottom of this section of autograd mechanics notes.

(Avijit Dasgupta) #3

Thanks for your reply.

I have another doubt here. While fine-tuning, should we not use a smaller learning rate for the whole network ( Here the example says us to fix the network and only learn the weights in the newly added layers? Which one in recommended?

(Adam Paszke) #5

In the snippet I’ve sent you only the last layer is optimized. The rest will be frozen (base parameters will receive no gradients - they have requires_grad=False). This is as if you used learning rate of 0 for that part. If you want to use a lower lr for the base see this section of the optim docs.

(Avijit Dasgupta) #6

Thanks :slight_smile: It helped.

(Avijit Dasgupta) #7
optimizer = torch.optim.SGD([
            {'params': model.conv1.parameters()},
            {'params': model.bn1.parameters()},
            {'params': model.relu.parameters()},
            {'params': model.maxpool.parameters()},
            {'params': model.layer1.parameters()},
            {'params': model.layer2.parameters()},
            {'params': model.layer3.parameters()},
            {'params': model.layer4.parameters()},
            {'params': model.avgpool.parameters()},
            {'params': model.fc.parameters(), 'lr':}
        ],*0.1, momentum=0.9)

Is this the correct way of defining different learning rate to different layers of ResNet18? Or is there any other optimized way to do that?

(Adam Paszke) #8

This should do it:

ignored_params = list(map(id, model.fc.parameters()))
base_params = filter(lambda p: id(p) not in ignored_params,

optimizer = torch.optim.SGD([
            {'params': base_params},
            {'params': model.fc.parameters(), 'lr':}
        ],*0.1, momentum=0.9)

(Avijit Dasgupta) #9

Great! Thanks :smile:

(Bin) #10

This may also help to learn how to modify layers without changing other layers’ parameters and construct a new model.

    model = models.vgg16(pretrained=True)
    print list(list(model.classifier.children())[1].parameters())
    mod = list(model.classifier.children())
    mod.append(torch.nn.Linear(4096, 2))
    new_classifier = torch.nn.Sequential(*mod)
    print list(list(new_classifier.children())[1].parameters())
    model.classifier = new_classifier

Also, you may consider to add pretrained models for VGG, which you may found here:

As for finetuning resnet, it is more easy:

model = models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(2048, 2)

(Saurav Sharma) #11

How do I add new layers to existing pretrained models? Here, the last layer by name is replaced with a Linear layer. How to add another layer after that?

(Adam Paszke) #12
class MyModel(nn.Module):
    def __init__(self, pretrained_model):
        self.pretrained_model = pretrained_model
        self.last_layer = ... # create layer

    def forward(self, x):
        return self.last_layer(self.pretrained_model(x))

pretrained_model = torchvision.models.resnet18(pretrained=True)
model = MyModel(pretrained_model)

(Saurav Sharma) #13

Thank you for the help.

Another question, I am not able to see classifier attribute when loading a pretrained ResNet-18. However, I can directly access the children attribute without needing classifier. Is the classifier attribute part of old version of PyTorch or something I am missing?

(Adam Paszke) #14

ResNets don’t have a classifier attribute. They use only a single fully-connected layer and it’s available as their fc attribute.

(Yili Zhao) #15

@zhoubinxyz may you add a more complete example to illustrate how to fine-tuning VGG model on a custom dataset using PyTorch?

(Adam Paszke) #16

You can get the basic idea form this PR.

(Yili Zhao) #17

There is an option named --pretrained in the imagenet file. May I ask that if I use this option with a custom dataset like these:
python --arch=alexnet --pretrained my_custom_dataset
What will happen with this command? It seems that like a fine-tuning.

(Adam Paszke) #18

I think --pretrained is meant for evaluation mode. The script doesn’t support finetuning at the moment.

(Yili Zhao) #19

@apaszke I reference this PR for fine-tuning. For alexnet and vggnet, the original code replay all the fully-connected layers. May I ask:

  • how can I only replace the last fully-connected layer for fine-tuning and freeze other fully-connected layers?
  • Is the forward the right way to code? Because you give some reference code above:

def forward(self, x):
return self.last_layer(self.pretrained_model(x))

Original fine-tuing code:

class FineTuneModel(nn.Module):
    def __init__(self, original_model, arch, num_classes):
        super(FineTuneModel, self).__init__()

        if arch.startswith('alexnet') :
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Linear(256 * 6 * 6, 4096),
                nn.Linear(4096, 4096),
                nn.Linear(4096, num_classes),
            self.modelName = 'alexnet'
        elif arch.startswith('resnet') :
            # Everything except the last linear layer
            self.features = nn.Sequential(*list(original_model.children())[:-1])
            self.classifier = nn.Sequential(
                nn.Linear(512, num_classes)
            self.modelName = 'resnet'
        elif arch.startswith('vgg16'):
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Linear(25088, 4096),
                nn.Linear(4096, 4096),
                nn.Linear(4096, num_classes),
            self.modelName = 'vgg16'
        else :
            raise("Finetuning not supported on this architecture yet")

        # Freeze those weights
        for p in self.features.parameters():
            p.requires_grad = False

    def forward(self, x):
        f = self.features(x)
        if self.modelName == 'alexnet' :
            f = f.view(f.size(0), 256 * 6 * 6)
        elif self.modelName == 'vgg16':
            f = f.view(f.size(0), -1)
        elif self.modelName == 'resnet' :
            f = f.view(f.size(0), -1)
        y = self.classifier(f)
        return y

(Adam Paszke) #20

This post should help you.

(Bartosz Ludwiczuk) #21

I added following lines to imagenet example, using pretrained model of resnet18.

 for param in model.parameters():
      param.requires_grad = False

 # Replace the last fully-connected layer
 # Parameters of newly constructed modules have requires_grad=True by default
 model.fc = torch.nn.Linear(512, 3)

 optimizer = torch.optim.SGD(model.fc.parameters(),,

But then I have following error:

File "", line 234, in train
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: there are no graph nodes that require computing gradients

I would like to freeze all parameters of original ResNet18 and just learn the last layer with 3 classes. How I should do this correctly? Based on information from the forum, this should we the working version.