How to perform finetuning in Pytorch?

optimizer = torch.optim.SGD([
            {'params': model.conv1.parameters()},
            {'params': model.bn1.parameters()},
            {'params': model.relu.parameters()},
            {'params': model.maxpool.parameters()},
            {'params': model.layer1.parameters()},
            {'params': model.layer2.parameters()},
            {'params': model.layer3.parameters()},
            {'params': model.layer4.parameters()},
            {'params': model.avgpool.parameters()},
            {'params': model.fc.parameters(), 'lr':}
        ],*0.1, momentum=0.9)

Is this the correct way of defining different learning rate to different layers of ResNet18? Or is there any other optimized way to do that?

This should do it:

ignored_params = list(map(id, model.fc.parameters()))
base_params = filter(lambda p: id(p) not in ignored_params,

optimizer = torch.optim.SGD([
            {'params': base_params},
            {'params': model.fc.parameters(), 'lr':}
        ],*0.1, momentum=0.9)


Great! Thanks :smile:

This may also help to learn how to modify layers without changing other layers’ parameters and construct a new model.

    model = models.vgg16(pretrained=True)
    print list(list(model.classifier.children())[1].parameters())
    mod = list(model.classifier.children())
    mod.append(torch.nn.Linear(4096, 2))
    new_classifier = torch.nn.Sequential(*mod)
    print list(list(new_classifier.children())[1].parameters())
    model.classifier = new_classifier

Also, you may consider to add pretrained models for VGG, which you may found here:

As for finetuning resnet, it is more easy:

model = models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(2048, 2)

How do I add new layers to existing pretrained models? Here, the last layer by name is replaced with a Linear layer. How to add another layer after that?

class MyModel(nn.Module):
    def __init__(self, pretrained_model):
        self.pretrained_model = pretrained_model
        self.last_layer = ... # create layer

    def forward(self, x):
        return self.last_layer(self.pretrained_model(x))

pretrained_model = torchvision.models.resnet18(pretrained=True)
model = MyModel(pretrained_model)

Thank you for the help.

Another question, I am not able to see classifier attribute when loading a pretrained ResNet-18. However, I can directly access the children attribute without needing classifier. Is the classifier attribute part of old version of PyTorch or something I am missing?

ResNets don’t have a classifier attribute. They use only a single fully-connected layer and it’s available as their fc attribute.

1 Like

@zhoubinxyz may you add a more complete example to illustrate how to fine-tuning VGG model on a custom dataset using PyTorch?

1 Like

You can get the basic idea form this PR.

1 Like

There is an option named --pretrained in the imagenet file. May I ask that if I use this option with a custom dataset like these:
python --arch=alexnet --pretrained my_custom_dataset
What will happen with this command? It seems that like a fine-tuning.

I think --pretrained is meant for evaluation mode. The script doesn’t support finetuning at the moment.

1 Like

@apaszke I reference this PR for fine-tuning. For alexnet and vggnet, the original code replay all the fully-connected layers. May I ask:

  • how can I only replace the last fully-connected layer for fine-tuning and freeze other fully-connected layers?
  • Is the forward the right way to code? Because you give some reference code above:

def forward(self, x):
return self.last_layer(self.pretrained_model(x))

Original fine-tuing code:

class FineTuneModel(nn.Module):
    def __init__(self, original_model, arch, num_classes):
        super(FineTuneModel, self).__init__()

        if arch.startswith('alexnet') :
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Linear(256 * 6 * 6, 4096),
                nn.Linear(4096, 4096),
                nn.Linear(4096, num_classes),
            self.modelName = 'alexnet'
        elif arch.startswith('resnet') :
            # Everything except the last linear layer
            self.features = nn.Sequential(*list(original_model.children())[:-1])
            self.classifier = nn.Sequential(
                nn.Linear(512, num_classes)
            self.modelName = 'resnet'
        elif arch.startswith('vgg16'):
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Linear(25088, 4096),
                nn.Linear(4096, 4096),
                nn.Linear(4096, num_classes),
            self.modelName = 'vgg16'
        else :
            raise("Finetuning not supported on this architecture yet")

        # Freeze those weights
        for p in self.features.parameters():
            p.requires_grad = False

    def forward(self, x):
        f = self.features(x)
        if self.modelName == 'alexnet' :
            f = f.view(f.size(0), 256 * 6 * 6)
        elif self.modelName == 'vgg16':
            f = f.view(f.size(0), -1)
        elif self.modelName == 'resnet' :
            f = f.view(f.size(0), -1)
        y = self.classifier(f)
        return y
1 Like

This post should help you.

1 Like

I added following lines to imagenet example, using pretrained model of resnet18.

 for param in model.parameters():
      param.requires_grad = False

 # Replace the last fully-connected layer
 # Parameters of newly constructed modules have requires_grad=True by default
 model.fc = torch.nn.Linear(512, 3)

 optimizer = torch.optim.SGD(model.fc.parameters(),,

But then I have following error:

File "", line 234, in train
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: there are no graph nodes that require computing gradients

I would like to freeze all parameters of original ResNet18 and just learn the last layer with 3 classes. How I should do this correctly? Based on information from the forum, this should we the working version.

That should work. Can you post the entire code, just to check if there is some error there and maybe trying to run it here?

Here is the full code:

It is original ImageNet example with some elements of Visdom, which I tried to use. So running it on your own should stat Visdom server or delete some lines.


Do you know how can I change the cost function while finetuning a pre-trained model (like ResNet-18, VGG-16) or create a customized cost function and use it?


the cost function lies outside the model definition, change it like you would change the cost function as usual.

What is the learning rate for base_params and fc.parameters() in this example How to perform finetuning in Pytorch?? @apaszke