How to perform finetuning in Pytorch?

avijit_dasgupta · February 10, 2017, 6:40am

Can anyone tell me how to do finetuning in pytorch? Suppose, I have loaded the Resnet 18 pretrained model. Now I want to finetune it on my own dataset which contain say 10 classes. How to remove the last output layer and change to as per my requirement?

apaszke · February 10, 2017, 2:40pm

You can find an example at the bottom of this section of autograd mechanics notes.

avijit_dasgupta · February 10, 2017, 4:36pm

Thanks for your reply.

I have another doubt here. While fine-tuning, should we not use a smaller learning rate for the whole network (http://cs231n.github.io/transfer-learning/)? Here the example says us to fix the network and only learn the weights in the newly added layers? Which one in recommended?

apaszke · February 10, 2017, 7:42pm

In the snippet I’ve sent you only the last layer is optimized. The rest will be frozen (base parameters will receive no gradients - they have requires_grad=False). This is as if you used learning rate of 0 for that part. If you want to use a lower lr for the base see this section of the optim docs.

avijit_dasgupta · February 11, 2017, 5:02am

Thanks It helped.

avijit_dasgupta · February 12, 2017, 5:55am

optimizer = torch.optim.SGD([
            {'params': model.conv1.parameters()},
            {'params': model.bn1.parameters()},
            {'params': model.relu.parameters()},
            {'params': model.maxpool.parameters()},
            {'params': model.layer1.parameters()},
            {'params': model.layer2.parameters()},
            {'params': model.layer3.parameters()},
            {'params': model.layer4.parameters()},
            {'params': model.avgpool.parameters()},
            {'params': model.fc.parameters(), 'lr': opt.lr}
        ], lr=opt.lr*0.1, momentum=0.9)

Is this the correct way of defining different learning rate to different layers of ResNet18? Or is there any other optimized way to do that?

apaszke · February 12, 2017, 1:45pm

This should do it:

ignored_params = list(map(id, model.fc.parameters()))
base_params = filter(lambda p: id(p) not in ignored_params,
                     model.parameters())

optimizer = torch.optim.SGD([
            {'params': base_params},
            {'params': model.fc.parameters(), 'lr': opt.lr}
        ], lr=opt.lr*0.1, momentum=0.9)

avijit_dasgupta · February 12, 2017, 3:33pm

Great! Thanks

zhoubinxyz · February 18, 2017, 2:50am

This may also help to learn how to modify layers without changing other layers’ parameters and construct a new model.

    model = models.vgg16(pretrained=True)
    print list(list(model.classifier.children())[1].parameters())
    mod = list(model.classifier.children())
    mod.pop()
    mod.append(torch.nn.Linear(4096, 2))
    new_classifier = torch.nn.Sequential(*mod)
    print list(list(new_classifier.children())[1].parameters())
    model.classifier = new_classifier

Also, you may consider to add pretrained models for VGG, which you may found here:

As for finetuning resnet, it is more easy:

model = models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(2048, 2)

srv902 · February 20, 2017, 10:56am

How do I add new layers to existing pretrained models? Here, the last layer by name is replaced with a Linear layer. How to add another layer after that?

apaszke · February 20, 2017, 11:01am

class MyModel(nn.Module):
    def __init__(self, pretrained_model):
        self.pretrained_model = pretrained_model
        self.last_layer = ... # create layer

    def forward(self, x):
        return self.last_layer(self.pretrained_model(x))

pretrained_model = torchvision.models.resnet18(pretrained=True)
model = MyModel(pretrained_model)

srv902 · February 20, 2017, 11:25am

Thank you for the help.

Another question, I am not able to see classifier attribute when loading a pretrained ResNet-18. However, I can directly access the children attribute without needing classifier. Is the classifier attribute part of old version of PyTorch or something I am missing?

apaszke · February 20, 2017, 11:32am

ResNets don’t have a classifier attribute. They use only a single fully-connected layer and it’s available as their fc attribute.

panovr · February 26, 2017, 3:36pm

@zhoubinxyz may you add a more complete example to illustrate how to fine-tuning VGG model on a custom dataset using PyTorch?

apaszke · February 26, 2017, 6:47pm

You can get the basic idea form this PR.

panovr · February 26, 2017, 9:49pm

There is an option named --pretrained in the imagenet main.py file. May I ask that if I use this option with a custom dataset like these:
python main.py --arch=alexnet --pretrained my_custom_dataset
What will happen with this command? It seems that like a fine-tuning.

apaszke · February 26, 2017, 10:00pm

I think --pretrained is meant for evaluation mode. The script doesn’t support finetuning at the moment.

panovr · March 2, 2017, 1:12pm

@apaszke I reference this PR for fine-tuning. For alexnet and vggnet, the original code replay all the fully-connected layers. May I ask:

how can I only replace the last fully-connected layer for fine-tuning and freeze other fully-connected layers?
Is the forward the right way to code? Because you give some reference code above:

def forward(self, x):
return self.last_layer(self.pretrained_model(x))

Original fine-tuing code:

class FineTuneModel(nn.Module):
    def __init__(self, original_model, arch, num_classes):
        super(FineTuneModel, self).__init__()

        if arch.startswith('alexnet') :
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Dropout(),
                nn.Linear(256 * 6 * 6, 4096),
                nn.ReLU(inplace=True),
                nn.Dropout(),
                nn.Linear(4096, 4096),
                nn.ReLU(inplace=True),
                nn.Linear(4096, num_classes),
            )
            self.modelName = 'alexnet'
        elif arch.startswith('resnet') :
            # Everything except the last linear layer
            self.features = nn.Sequential(*list(original_model.children())[:-1])
            self.classifier = nn.Sequential(
                nn.Linear(512, num_classes)
            )
            self.modelName = 'resnet'
        elif arch.startswith('vgg16'):
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Dropout(),
                nn.Linear(25088, 4096),
                nn.ReLU(inplace=True),
                nn.Dropout(),
                nn.Linear(4096, 4096),
                nn.ReLU(inplace=True),
                nn.Linear(4096, num_classes),
            )
            self.modelName = 'vgg16'
        else :
            raise("Finetuning not supported on this architecture yet")

        # Freeze those weights
        for p in self.features.parameters():
            p.requires_grad = False


    def forward(self, x):
        f = self.features(x)
        if self.modelName == 'alexnet' :
            f = f.view(f.size(0), 256 * 6 * 6)
        elif self.modelName == 'vgg16':
            f = f.view(f.size(0), -1)
        elif self.modelName == 'resnet' :
            f = f.view(f.size(0), -1)
        y = self.classifier(f)
        return y

apaszke · March 2, 2017, 8:09pm

This post should help you.

melgor · March 20, 2017, 8:28pm

I added following lines to imagenet example, using pretrained model of resnet18.

 for param in model.parameters():
      param.requires_grad = False

 # Replace the last fully-connected layer
 # Parameters of newly constructed modules have requires_grad=True by default
 model.fc = torch.nn.Linear(512, 3)

 optimizer = torch.optim.SGD(model.fc.parameters(), args.lr,
                            momentum=args.momentum,
                            weight_decay=args.weight_decay)

But then I have following error:

File "main.py", line 234, in train
    loss.backward()
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: there are no graph nodes that require computing gradients

I would like to freeze all parameters of original ResNet18 and just learn the last layer with 3 classes. How I should do this correctly? Based on information from the forum, this should we the working version.