How could I fine tune torchvision model zoo with different learning rate?

I would like to fine tune the models in the torchvision model zoo on my own dataset. I need to set different learning rate on the original layers and the modified classifier layer. I modified the resnet model like this:

    model = torchvision.models.resnet101(pretrained = False)
    model.fc = nn.Linear(in_features = 2048, out_features = 10)

how could I design the optimizer?
This seems not work

    optimizer = torch.optim.SGD(
            {'params': model.parameters()[:-1], 'lr': 1e-4, 'momentum': 0.9, 'weight_decay': 1e-4},
            {'params': model.parameters()[-1], 'lr': 5e-3, 'momentum': 0.9, 'weight_decay': 1e-4},)

I think there are some minor errors (missing bracket and the slice op on a generator).
This should work:

optimizer = torch.optim.SGD([
    {'params': list(model.parameters())[:-1], 'lr': 1e-4, 'momentum': 0.9, 'weight_decay': 1e-4},
    {'params': list(model.parameters())[-1], 'lr': 5e-3, 'momentum': 0.9, 'weight_decay': 1e-4}
1 Like

Got it, but consider a weird case of Resnet 101

  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential( ... )
  (layer2): Sequential( ... )
  (layer3): Sequential( ... )
  (layer4): Sequential( ... )
  (avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0)
  (fc): Linear(in_features=2048, out_features=10, bias=True)

If I need to fine tune the parameters of layer2 with lr = 1e-3 while finetune the other parameters with lr = 1e-4. How could I write them to optimizer then?

In this case, I would set the learning rate for layer2 and use the default for all others as shown in the docs.

Here I’ve created a small example, how to filter out special layers.

Thank you so much, that is very helpful!

For the resnet model, I believe this method might not be sufficient. The final Linear layer would have 2 parameters, the weight and the bias. By using model.parameters()[-1], you are using a different LR just for the bias term of the final Linear layer.