How to set a different learning rate for a single layer in a network

Hi,

I am trying to change the learning rate for any arbitrary single layer (which is part of a nn.Sequential block). For example, I use a VGG16 network and wish to control the learning rate of one of the fully connected layers in the classifier.

Going by this link: https://pytorch.org/docs/0.3.0/optim.html#per-parameter-options, we can specify the learning rate like this -

optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)

But here, both base and classifier are entire blocks. In the VGG16 network for example, I want to change the learning rate for classifier[0] / classifier[3] / classifier[6], which are linear layers. Any ideas as to how that can be accomplished?

VGG16 network:
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.5)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace)
(5): Dropout(p=0.75)
(6): Linear(in_features=4096, out_features=10, bias=True)
(7): Softmax()
)
)

You just need to create more groups as you did.
run

for name,param in model.named_parameters():
     filter them out and create a list of dicts
optim.SGD(list)

The only constrain is you cannot repeat parameters, thus, if you decompose classifier parameters you will have to assign them all by this method.

Thank you for the response. Here is what I did:

my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(filter(lambda kv: kv[0] in my_list, vgg16.named_parameters()))
base_params = list(filter(lambda kv: kv[0] not in my_list, vgg16.named_parameters()))

And then defined the optimizer:

optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

However, I get the following error:

TypeError: optimizer can only optimize Tensors, but one of the params is tuple

I get the same error when I try this as well:

optimizer = SGD([{'params': base_params, 'lr': 3e-6, 'momentum': 0.9 }, {'params': params, 'lr': 1e-4, 'momentum': 0.9}])

I am not entirely sure what I need to change. Any ideas?

Try this, hope it helps!

optimizer = SGD([{'params': model.classifier[0].parameters(), 'lr': 3e-6, 'momentum': 0.9 }, 

                 {'params': m.classifier[1].parameters(), 'lr': 1e-4, 'momentum': 0.9}])

Please correct me if I am wrong, but here the learning rates have been set only for two layers of the network: classifier[0] and classifier[1]. The rest of the network doesn’t have learning rates associated with them.

What I wish to accomplish is to change the learning rate for a single layer only (in a Sequential block), and have a common learning rate for the rest of the layers.

try this

optimizer = SGD([{'params': model.classifier[0].parameters(), 'lr': 3e-6, 'momentum': 0.9 }], 
                 model.parameters,lr=1e-2 ,momentum=0.9
)

Hi, when I try this, that returns the following error:

TypeError: __init__() got multiple values for argument 'lr'

I am not quite sure what change I need to make. As @JuanFMontesinos mentioned, I think I need to specify separate parameter lists for each learning rate. Though I don’t know how to do that, given the error I mentioned earlier:

TypeError: optimizer can only optimize Tensors, but one of the params is tuple

Any ideas?

I’m afraid you can only set each parameter once. The way @sai_tharun mentioned you are passing parameter 0 twice.

The problem is once you set parameter for classifier you need to set all the parameters of classifier .
As I aforementioned you need to do for name,param in model.classifier.named_parameters():
If name == yourlayer_name:
List.append(…)
Else:
Others.append(…)

Ofc you have to pass the rest of the network which is not model.pclassifier as you were doing

I am not quite sure what you mean.

As you can see, using the following code (similar to what @ptrblck details in link):

my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(filter(lambda kv: kv[0] in my_list, vgg16.named_parameters()))
base_params = list(filter(lambda kv: kv[0] not in my_list, vgg16.named_parameters()))

returns two lists - the layer(s) that requires a different learning rate (classifier[3] in this case), and the rest of the network. I am having trouble passing these to the optimizer like so:

optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

I think is incorrect since it gives me errors. Any thoughts @ptrblck, @JuanFMontesinos?

Hi,
You are passing also the name that way as params contains tuples of (name,parameters)
could you please use

from torchvision.models import vgg16
from torch.optim import SGD
model = vgg16()
my_list = ['classifier.3.weight', 'classifier.3.bias']
params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] in my_list, model.named_parameters()))))
base_params = list(map(lambda x: x[1],list(filter(lambda kv: kv[0] not in my_list, model.named_parameters()))))
optimizer = SGD([{'params': base_params}, {'params': params, 'lr': '1e-4'}], lr=3e-6, momentum=0.9)

This is ok, I promise :slight_smile:

1 Like

This works! Thank you very much!