How to perform finetuning in Pytorch?

You can find an example at the bottom of this section of autograd mechanics notes.

12 Likes

Thanks for your reply.

I have another doubt here. While fine-tuning, should we not use a smaller learning rate for the whole network (http://cs231n.github.io/transfer-learning/)? Here the example says us to fix the network and only learn the weights in the newly added layers? Which one in recommended?

In the snippet I’ve sent you only the last layer is optimized. The rest will be frozen (base parameters will receive no gradients - they have requires_grad=False). This is as if you used learning rate of 0 for that part. If you want to use a lower lr for the base see this section of the optim docs.

3 Likes

Thanks :slight_smile: It helped.

optimizer = torch.optim.SGD([
            {'params': model.conv1.parameters()},
            {'params': model.bn1.parameters()},
            {'params': model.relu.parameters()},
            {'params': model.maxpool.parameters()},
            {'params': model.layer1.parameters()},
            {'params': model.layer2.parameters()},
            {'params': model.layer3.parameters()},
            {'params': model.layer4.parameters()},
            {'params': model.avgpool.parameters()},
            {'params': model.fc.parameters(), 'lr': opt.lr}
        ], lr=opt.lr*0.1, momentum=0.9)

Is this the correct way of defining different learning rate to different layers of ResNet18? Or is there any other optimized way to do that?

This should do it:

ignored_params = list(map(id, model.fc.parameters()))
base_params = filter(lambda p: id(p) not in ignored_params,
                     model.parameters())

optimizer = torch.optim.SGD([
            {'params': base_params},
            {'params': model.fc.parameters(), 'lr': opt.lr}
        ], lr=opt.lr*0.1, momentum=0.9)

44 Likes

Great! Thanks :smile:

This may also help to learn how to modify layers without changing other layers’ parameters and construct a new model.

    model = models.vgg16(pretrained=True)
    print list(list(model.classifier.children())[1].parameters())
    mod = list(model.classifier.children())
    mod.pop()
    mod.append(torch.nn.Linear(4096, 2))
    new_classifier = torch.nn.Sequential(*mod)
    print list(list(new_classifier.children())[1].parameters())
    model.classifier = new_classifier

Also, you may consider to add pretrained models for VGG, which you may found here:

As for finetuning resnet, it is more easy:

model = models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(2048, 2)
18 Likes

How do I add new layers to existing pretrained models? Here, the last layer by name is replaced with a Linear layer. How to add another layer after that?

2 Likes
class MyModel(nn.Module):
    def __init__(self, pretrained_model):
        self.pretrained_model = pretrained_model
        self.last_layer = ... # create layer

    def forward(self, x):
        return self.last_layer(self.pretrained_model(x))

pretrained_model = torchvision.models.resnet18(pretrained=True)
model = MyModel(pretrained_model)
19 Likes

Thank you for the help.

Another question, I am not able to see classifier attribute when loading a pretrained ResNet-18. However, I can directly access the children attribute without needing classifier. Is the classifier attribute part of old version of PyTorch or something I am missing?

ResNets don’t have a classifier attribute. They use only a single fully-connected layer and it’s available as their fc attribute.

1 Like

@zhoubinxyz may you add a more complete example to illustrate how to fine-tuning VGG model on a custom dataset using PyTorch?

1 Like

You can get the basic idea form this PR.

1 Like

There is an option named --pretrained in the imagenet main.py file. May I ask that if I use this option with a custom dataset like these:
python main.py --arch=alexnet --pretrained my_custom_dataset
What will happen with this command? It seems that like a fine-tuning.

I think --pretrained is meant for evaluation mode. The script doesn’t support finetuning at the moment.

1 Like

@apaszke I reference this PR for fine-tuning. For alexnet and vggnet, the original code replay all the fully-connected layers. May I ask:

  • how can I only replace the last fully-connected layer for fine-tuning and freeze other fully-connected layers?
  • Is the forward the right way to code? Because you give some reference code above:

def forward(self, x):
return self.last_layer(self.pretrained_model(x))

Original fine-tuing code:

class FineTuneModel(nn.Module):
    def __init__(self, original_model, arch, num_classes):
        super(FineTuneModel, self).__init__()

        if arch.startswith('alexnet') :
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Dropout(),
                nn.Linear(256 * 6 * 6, 4096),
                nn.ReLU(inplace=True),
                nn.Dropout(),
                nn.Linear(4096, 4096),
                nn.ReLU(inplace=True),
                nn.Linear(4096, num_classes),
            )
            self.modelName = 'alexnet'
        elif arch.startswith('resnet') :
            # Everything except the last linear layer
            self.features = nn.Sequential(*list(original_model.children())[:-1])
            self.classifier = nn.Sequential(
                nn.Linear(512, num_classes)
            )
            self.modelName = 'resnet'
        elif arch.startswith('vgg16'):
            self.features = original_model.features
            self.classifier = nn.Sequential(
                nn.Dropout(),
                nn.Linear(25088, 4096),
                nn.ReLU(inplace=True),
                nn.Dropout(),
                nn.Linear(4096, 4096),
                nn.ReLU(inplace=True),
                nn.Linear(4096, num_classes),
            )
            self.modelName = 'vgg16'
        else :
            raise("Finetuning not supported on this architecture yet")

        # Freeze those weights
        for p in self.features.parameters():
            p.requires_grad = False


    def forward(self, x):
        f = self.features(x)
        if self.modelName == 'alexnet' :
            f = f.view(f.size(0), 256 * 6 * 6)
        elif self.modelName == 'vgg16':
            f = f.view(f.size(0), -1)
        elif self.modelName == 'resnet' :
            f = f.view(f.size(0), -1)
        y = self.classifier(f)
        return y
1 Like

This post should help you.

1 Like

I added following lines to imagenet example, using pretrained model of resnet18.

 for param in model.parameters():
      param.requires_grad = False

 # Replace the last fully-connected layer
 # Parameters of newly constructed modules have requires_grad=True by default
 model.fc = torch.nn.Linear(512, 3)

 optimizer = torch.optim.SGD(model.fc.parameters(), args.lr,
                            momentum=args.momentum,
                            weight_decay=args.weight_decay)

But then I have following error:

File "main.py", line 234, in train
    loss.backward()
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: there are no graph nodes that require computing gradients

I would like to freeze all parameters of original ResNet18 and just learn the last layer with 3 classes. How I should do this correctly? Based on information from the forum, this should we the working version.

That should work. Can you post the entire code, just to check if there is some error there and maybe trying to run it here?