Problem on different learning rate and weight decay in different layers

ywang · June 1, 2017, 12:31pm

I know I could use the named_parameters() to do that.
However, when I write a simple test, I face a bug.

import torch
import torch.nn as nn
import torch.optim as optim

if __name__ == '__main__':

    module = nn.Sequential(nn.Linear(2,3),nn.Linear(3,2))
    params_dict = dict(module.named_parameters())
    params = []
    for key, value in params_dict.items():
        if key[-4:] == 'bias':
            params += [{'params':value,'lr':0.0}]
        else:
            params += [{'params':value,'lr':0.1}]
    op = optim.SGD(params, momentum=0.9)

The error information:

Traceback (most recent call last):
  File "test_lr.py", line 15, in <module>
    op = optim.SGD(params, momentum=0.9)
  File "/home/v-yawan1/anaconda2/lib/python2.7/site-packages/torch/optim/sgd.py", line 56, in __init__
    super(SGD, self).__init__(params, defaults)
  File "/home/v-yawan1/anaconda2/lib/python2.7/site-packages/torch/optim/optimizer.py", line 61, in __init__
    raise ValueError("can't optimize a non-leaf Variable")
ValueError: can't optimize a non-leaf Variable

smth · June 2, 2017, 3:29pm

this actually seems like a bug. I’m investigating / isolating it. Let me get back to you on this and sorry about that.

smth · June 6, 2017, 3:49am

This was fixed on master via https://github.com/pytorch/pytorch/commit/a76098ac1532d5e9ee24b4776258ae731627f8e3

and will be fixed in the next release.

For now your workaround is:

import torch.nn as nn
import torch.optim as optim

if __name__ == '__main__':

    module = nn.Sequential(nn.Linear(2,3),nn.Linear(3,2))
    params_dict = dict(module.named_parameters())
    params = []
    for key, value in params_dict.items():
        if key[-4:] == 'bias':
            params += [{'params':[value],'lr':0.0}]
        else:
            params += [{'params':[value],'lr':0.1}]
    op = optim.SGD(params, momentum=0.9)

Alexey_Chernyavskiy · June 26, 2017, 4:54pm

I ran into a problem that I think is related.
I have a CNN which I created using nn.ModuleList:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.modules = list()
        self.modules.append(nn.Conv2d(1, 32, (3, 3), (1, 1), (1, 1)))
        self.modules.append(nn.PReLU(32))
        # etc.: keep adding some modules, the last one being No.14
        self.model = nn.ModuleList(self.modules)

Then, in my main file, when I create the optimizer, I write the following:

model = Net()
model = model.cuda()
optimizer = optim.Adam([{'params': model.modules[14].parameters(), 'lr': 0.1*opt.lr}], lr=opt.lr, weight_decay=1e-5)

So by this line I thought that I would have the learning rate equal to opt.lr for all the modules, EXCEPT the module #14, for which I want a 10 times lower learning rate. However, I get totally weird convergence. To check if that has something to do with the initialization, I removed the “0.1” in the optimizer initialization, making it basically having the same “lr” as every other module in my Net. BUT! The same weird convergence holds, much slower vs. if I just do a regular optimizer initialization (optimizer = optim.Adam(model.parameters(), lr=opt.lr, weight_decay=1e-5)) What might be the issue? Thanks!

ajbrock · June 26, 2017, 8:59pm

Looks like you’re not passing the optimizer the rest of the model’s parameters, only the parameters of model.module[14]. Instead try:

optimizer = optim.Adam([{'params': model.modules[14].parameters(), 'lr': 0.1*opt.lr}, {'params': [module.parameters() for index, module in enumerate(model.modules) if index != 14]}], lr=opt.lr, weight_decay=1e-5)

Alexey_Chernyavskiy · June 27, 2017, 8:06am

Thanks! That sounds like a good idea, however now after I corrected according to this suggestion, I have an error:
TypeError: optimizer can only optimize Variables, but one of the params is parameters

smth · July 2, 2017, 11:57pm

Alexey,

This part of your latest code is wrong:

{'params': [module.parameters() for index, module in enumerate(model.modules) if index != 14]}

It will put the python iterable module.parameters() into a list, and hence the error message. You can convert an iterable into a list for example via: list(module.parameters())

Alexey_Chernyavskiy · July 3, 2017, 12:59pm

Soumith,

If I do
optimizer = optim.Adam([{'params': model.modules[14].parameters(), 'lr': 0.1*opt.lr}, {'params': list(module.parameters()) for index, module in enumerate(model.modules) if index != 14}], lr=opt.lr, weight_decay=1e-5)

then it seems again that the convergence is very slow and I’m not passing all my params to the optimizer. Or did I put the list() in the wrong place?

So far I tried two ways:

optimizer = optim.Adam([{'params': model.modules[14].parameters(), 'lr': 0.1*opt.lr}, {'params': model.modules[0].parameters(), 'lr':opt.lr}, {'params': model.modules[1].parameters(), 'lr':opt.lr}, {'params': model.modules[2].parameters(), 'lr':opt.lr}, {'params': model.modules[3].parameters(), 'lr':opt.lr}, {'params': model.modules[4].parameters(), 'lr':opt.lr}, {'params': model.modules[5].parameters(), 'lr':opt.lr}, {'params': model.modules[6].parameters(), 'lr':opt.lr}, {'params': model.modules[7].parameters(), 'lr':opt.lr}, {'params': model.modules[8].parameters(), 'lr':opt.lr}, {'params': model.modules[9].parameters(), 'lr':opt.lr}, {'params': model.modules[10].parameters(), 'lr':opt.lr}, {'params': model.modules[11].parameters(), 'lr':opt.lr}, {'params': model.modules[12].parameters(), 'lr':opt.lr}, {'params': model.modules[13].parameters(), 'lr':opt.lr}], lr=opt.lr, eps=1e-8, weight_decay=1e-5)
and this works very well. But, obviouslty, it’s not satisfactory w.r.t. readability.

And I tried the second way:
optimizer = optim.Adam([{'params': model.modules[14].parameters(), 'lr': 0.1*opt.lr}, {'params': module.parameters() for index, module in enumerate(model.modules) if index != 14}], lr=opt.lr, eps=1e-8, weight_decay=1e-5)

This is much clearer and Pythonic, but it just doesn’t work.

UPD:
I changed to a more Pythonic way, and this time it works:

ml = list()
ml.append({'params': model.modules[14].parameters(), 'lr': 0.1*opt.lr})
for index, module in enumerate(model.modules):
    if (index != 14):
        ml.append({'params': module.parameters()})
optimizer = optim.Adam(ml, lr=opt.lr, eps=1e-8, weight_decay=1e-5)

Thanks Soumith and Andy!