Learning rate Decay combined with differential Learning Rate

Karan_Chhabra · November 6, 2020, 9:19pm

Hi,
I am using differential learning rates for different layers. But at the same time I am also using LR decay (I might change to OneCycle policy). But I wanted to know for which layers is the LR decay applied, below is the sample code:

optimizer = optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)

MultiStepLR(optimizer, milestones=[5,10], gamma=0.1)

Is the rate decay only applied to base layers or to both the classifier and base layers?

tom · November 6, 2020, 10:59pm

They loop over the parameter groups, i.e. the classifier lr will always be a tenth of the base in your example.

Best regards

Thomas

Karan_Chhabra · November 6, 2020, 11:07pm

Is the a way to make the classifier learning rate be constant while the base learning only decay’s?

tom · November 6, 2020, 11:30pm

Yes, stick the parameters in two optimizers. That should not make much of a difference w.r.t. performance, you just need to zero_grad and step twice.