How to set a different learning rate for a single layer in a network

partially_observed · June 20, 2019, 10:16pm

Hi,

I am trying to change the learning rate for any arbitrary single layer (which is part of a nn.Sequential block). For example, I use a VGG16 network and wish to control the learning rate of one of the fully connected layers in the classifier.

Going by this link: https://pytorch.org/docs/0.3.0/optim.html#per-parameter-options, we can specify the learning rate like this -

optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)

But here, both base and classifier are entire blocks. In the VGG16 network for example, I want to change the learning rate for classifier[0] / classifier[3] / classifier[6], which are linear layers. Any ideas as to how that can be accomplished?

VGG16 network:
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace)
(5): Dropout(p=0.75)
(6): Linear(in_features=4096, out_features=10, bias=True)
(7): Softmax()
)
)

JuanFMontesinos · June 20, 2019, 10:40pm

You just need to create more groups as you did.
run

for name,param in model.named_parameters():
     filter them out and create a list of dicts
optim.SGD(list)

The only constrain is you cannot repeat parameters, thus, if you decompose classifier parameters you will have to assign them all by this method.

partially_observed · June 21, 2019, 4:01pm

Thank you for the response. Here is what I did:

my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(filter(lambda kv: kv[0] in my_list, vgg16.named_parameters()))
base_params = list(filter(lambda kv: kv[0] not in my_list, vgg16.named_parameters()))

And then defined the optimizer:

optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

However, I get the following error:

TypeError: optimizer can only optimize Tensors, but one of the params is tuple

I get the same error when I try this as well:

optimizer = SGD([{'params': base_params, 'lr': 3e-6, 'momentum': 0.9 }, {'params': params, 'lr': 1e-4, 'momentum': 0.9}])

I am not entirely sure what I need to change. Any ideas?

sai_tharun · June 21, 2019, 5:40pm

Try this, hope it helps!

optimizer = SGD([{'params': model.classifier[0].parameters(), 'lr': 3e-6, 'momentum': 0.9 }, 

                 {'params': m.classifier[1].parameters(), 'lr': 1e-4, 'momentum': 0.9}])

partially_observed · June 21, 2019, 6:01pm

Please correct me if I am wrong, but here the learning rates have been set only for two layers of the network: classifier[0] and classifier[1]. The rest of the network doesn’t have learning rates associated with them.

What I wish to accomplish is to change the learning rate for a single layer only (in a Sequential block), and have a common learning rate for the rest of the layers.

sai_tharun · June 21, 2019, 6:16pm

try this

optimizer = SGD([{'params': model.classifier[0].parameters(), 'lr': 3e-6, 'momentum': 0.9 }], 
                 model.parameters,lr=1e-2 ,momentum=0.9
)

partially_observed · June 21, 2019, 6:48pm

Hi, when I try this, that returns the following error:

TypeError: __init__() got multiple values for argument 'lr'

I am not quite sure what change I need to make. As @JuanFMontesinos mentioned, I think I need to specify separate parameter lists for each learning rate. Though I don’t know how to do that, given the error I mentioned earlier:

TypeError: optimizer can only optimize Tensors, but one of the params is tuple

Any ideas?

JuanFMontesinos · June 21, 2019, 7:05pm

I’m afraid you can only set each parameter once. The way @sai_tharun mentioned you are passing parameter 0 twice.

The problem is once you set parameter for classifier you need to set all the parameters of classifier .
As I aforementioned you need to do for name,param in model.classifier.named_parameters():
If name == yourlayer_name:
List.append(…)
Else:
Others.append(…)

Ofc you have to pass the rest of the network which is not model.pclassifier as you were doing

partially_observed · June 21, 2019, 9:07pm

I am not quite sure what you mean.

As you can see, using the following code (similar to what @ptrblck details in link):

my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(filter(lambda kv: kv[0] in my_list, vgg16.named_parameters()))
base_params = list(filter(lambda kv: kv[0] not in my_list, vgg16.named_parameters()))

returns two lists - the layer(s) that requires a different learning rate (classifier[3] in this case), and the rest of the network. I am having trouble passing these to the optimizer like so:

optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

I think is incorrect since it gives me errors. Any thoughts @ptrblck, @JuanFMontesinos?

JuanFMontesinos · June 21, 2019, 10:11pm

Hi,
You are passing also the name that way as params contains tuples of (name,parameters)
could you please use

from torchvision.models import vgg16
from torch.optim import SGD
model = vgg16()
my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] in my_list, model.named_parameters()))))
base_params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] not in my_list, model.named_parameters()))))
optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

This is ok, I promise

partially_observed · June 25, 2019, 4:24pm

This works! Thank you very much!

CharviVitthal · November 25, 2020, 4:56am

Hi, I tried this but it is giving me the error: “TypeError: SGD() got multiple values for argument ‘lr’”. Do you have any suggestion on what the problem could be and how to solve it?
Thanks and regards,
Charvi

JuanFMontesinos · November 25, 2020, 9:18am

optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

is it possible you are repeating the ‘lr’ argument in any of the dictionaries you are passing?

CharviVitthal · November 27, 2020, 1:48am

Does not look like that. I have used in exactly the same way as

my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] in my_list, model.named_parameters()))))
base_params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] not in my_list, model.named_parameters()))))
optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

params and base_params does not contain the names of the parameters involved but I printed their shapes and they seem to be distinct.

I also tried
optimizer = SGD([{'params': base_params, 'lr':'3e-6'}, {'params': params, 'lr': '1e-4'}], momentum=0.9)
this gave me the error
TypeError: '<' not supported between instances of 'list' and 'float' at the start of the training loop.

File "xxx.py", line 251, in main
    with Trainer(model, optimizer, F.cross_entropy, scheduler=scheduler, callbacks=_callbacks) as trainer:
  File "/usr/local/lib/python3.8/site-packages/homura/trainers.py", line 519, in __init__
    super(SupervisedTrainer, self).__init__(model, optimizer, loss_f, callbacks=callbacks, scheduler=scheduler,
  File "/usr/local/lib/python3.8/site-packages/homura/trainers.py", line 121, in __init__
    self.set_optimizer()
  File "/usr/local/lib/python3.8/site-packages/homura/trainers.py", line 422, in set_optimizer
    self.optimizer = optimizer(self.model.parameters())
  File "/usr/local/lib/python3.8/site-packages/torch/optim/sgd.py", line 57, in __init__
    if lr is not required and lr < 0.0:
TypeError: '<' not supported between instances of 'list' and 'float'

Any suggestions are welcome and deeply appreciated!

JuanFMontesinos · November 30, 2020, 4:37pm

Sorry for the late replay but without code it seems everything is about a mistake.
Can you post some standalone code?

CharviVitthal · December 1, 2020, 12:42am

Hi, the problem got solved.
I was using a github repository as my base code and the problem was that I was using the optim function defined by them instead of the torch.optim. I changed it to torch.optim and followed your method. It works very well. Sorry for the confusion and thanks for the help.