Optimizer, set learning rate per param

Yonathan_Aflalo · February 19, 2019, 1:22pm

Hi.

I have two groups of parameters, one on which I would like to apply a learning rate of 0.1 and another one of 1e-5.

I tried to write the following piece of code

optimizer = torch.optim.SGD([{'params': parameters_reduced, 'lr': 1e-5},
                                 {'params': parameters_to_optimize, 'lr': 0.1}], 0.1,
                                momentum=args.momentum,
                                weight_decay=args.weight_decay)

But it seems like there is not effect on the training.

For example, I see that assigning 0 to the learning rate of parameters_reduced or to remove it from the optimizer are not equivalent. This behavior is strange, because I would have expected the two following codes to lead to the same result:

optimizer = torch.optim.SGD(param1, 0.1,
                                momentum=args.momentum,
                                weight_decay=args.weight_decay)

optimizer = torch.optim.SGD([{'params': param2, 'lr': 0},
                                 {'params': param1, 'lr': 0.1}], 0.1,
                                momentum=args.momentum,
                                weight_decay=args.weight_decay)

and somehow, I get different results. (it is not a problem of random seed, since all my seeds are initialised and the results are reproducible.

Thanks