How to set a different learning rate for a single layer in a network

Hi,

I am trying to change the learning rate for any arbitrary single layer (which is part of a nn.Sequential block). For example, I use a VGG16 network and wish to control the learning rate of one of the fully connected layers in the classifier.

Going by this link: https://pytorch.org/docs/0.3.0/optim.html#per-parameter-options, we can specify the learning rate like this -

optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)

But here, both base and classifier are entire blocks. In the VGG16 network for example, I want to change the learning rate for classifier[0] / classifier[3] / classifier[6], which are linear layers. Any ideas as to how that can be accomplished?

VGG16 network:
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace)
(5): Dropout(p=0.75)
(6): Linear(in_features=4096, out_features=10, bias=True)
(7): Softmax()
)
)

You just need to create more groups as you did.
run

for name,param in model.named_parameters():
     filter them out and create a list of dicts
optim.SGD(list)

The only constrain is you cannot repeat parameters, thus, if you decompose classifier parameters you will have to assign them all by this method.

Thank you for the response. Here is what I did:

my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(filter(lambda kv: kv[0] in my_list, vgg16.named_parameters()))
base_params = list(filter(lambda kv: kv[0] not in my_list, vgg16.named_parameters()))

And then defined the optimizer:

optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

However, I get the following error:

TypeError: optimizer can only optimize Tensors, but one of the params is tuple

I get the same error when I try this as well:

optimizer = SGD([{'params': base_params, 'lr': 3e-6, 'momentum': 0.9 }, {'params': params, 'lr': 1e-4, 'momentum': 0.9}])

I am not entirely sure what I need to change. Any ideas?

Try this, hope it helps!

optimizer = SGD([{'params': model.classifier[0].parameters(), 'lr': 3e-6, 'momentum': 0.9 }, 

                 {'params': m.classifier[1].parameters(), 'lr': 1e-4, 'momentum': 0.9}])

Please correct me if I am wrong, but here the learning rates have been set only for two layers of the network: classifier[0] and classifier[1]. The rest of the network doesn’t have learning rates associated with them.

What I wish to accomplish is to change the learning rate for a single layer only (in a Sequential block), and have a common learning rate for the rest of the layers.

try this

optimizer = SGD([{'params': model.classifier[0].parameters(), 'lr': 3e-6, 'momentum': 0.9 }], 
                 model.parameters,lr=1e-2 ,momentum=0.9
)

Hi, when I try this, that returns the following error:

TypeError: __init__() got multiple values for argument 'lr'

I am not quite sure what change I need to make. As @JuanFMontesinos mentioned, I think I need to specify separate parameter lists for each learning rate. Though I don’t know how to do that, given the error I mentioned earlier:

TypeError: optimizer can only optimize Tensors, but one of the params is tuple

Any ideas?

I’m afraid you can only set each parameter once. The way @sai_tharun mentioned you are passing parameter 0 twice.

The problem is once you set parameter for classifier you need to set all the parameters of classifier .
As I aforementioned you need to do for name,param in model.classifier.named_parameters():
If name == yourlayer_name:
List.append(…)
Else:
Others.append(…)

Ofc you have to pass the rest of the network which is not model.pclassifier as you were doing

I am not quite sure what you mean.

As you can see, using the following code (similar to what @ptrblck details in link):

my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(filter(lambda kv: kv[0] in my_list, vgg16.named_parameters()))
base_params = list(filter(lambda kv: kv[0] not in my_list, vgg16.named_parameters()))

returns two lists - the layer(s) that requires a different learning rate (classifier[3] in this case), and the rest of the network. I am having trouble passing these to the optimizer like so:

optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

I think is incorrect since it gives me errors. Any thoughts @ptrblck, @JuanFMontesinos?

1 Like

Hi,
You are passing also the name that way as params contains tuples of (name,parameters)
could you please use

from torchvision.models import vgg16
from torch.optim import SGD
model = vgg16()
my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] in my_list, model.named_parameters()))))
base_params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] not in my_list, model.named_parameters()))))
optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

This is ok, I promise :slight_smile:

12 Likes

This works! Thank you very much!

Hi, I tried this but it is giving me the error: “TypeError: SGD() got multiple values for argument ‘lr’”. Do you have any suggestion on what the problem could be and how to solve it?
Thanks and regards,
Charvi

optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

is it possible you are repeating the ‘lr’ argument in any of the dictionaries you are passing?

Does not look like that. I have used in exactly the same way as

my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] in my_list, model.named_parameters()))))
base_params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] not in my_list, model.named_parameters()))))
optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

params and base_params does not contain the names of the parameters involved but I printed their shapes and they seem to be distinct.

I also tried
optimizer = SGD([{'params': base_params, 'lr':'3e-6'}, {'params': params, 'lr': '1e-4'}], momentum=0.9)
this gave me the error
TypeError: '<' not supported between instances of 'list' and 'float' at the start of the training loop.

File "xxx.py", line 251, in main
    with Trainer(model, optimizer, F.cross_entropy, scheduler=scheduler, callbacks=_callbacks) as trainer:
  File "/usr/local/lib/python3.8/site-packages/homura/trainers.py", line 519, in __init__
    super(SupervisedTrainer, self).__init__(model, optimizer, loss_f, callbacks=callbacks, scheduler=scheduler,
  File "/usr/local/lib/python3.8/site-packages/homura/trainers.py", line 121, in __init__
    self.set_optimizer()
  File "/usr/local/lib/python3.8/site-packages/homura/trainers.py", line 422, in set_optimizer
    self.optimizer = optimizer(self.model.parameters())
  File "/usr/local/lib/python3.8/site-packages/torch/optim/sgd.py", line 57, in __init__
    if lr is not required and lr < 0.0:
TypeError: '<' not supported between instances of 'list' and 'float'

Any suggestions are welcome and deeply appreciated! :slight_smile:

Sorry for the late replay but without code it seems everything is about a mistake.
Can you post some standalone code?

Hi, the problem got solved.
I was using a github repository as my base code and the problem was that I was using the optim function defined by them instead of the torch.optim. I changed it to torch.optim and followed your method. It works very well. Sorry for the confusion and thanks for the help.