Hi,
I am using differential learning rates for different layers. But at the same time I am also using LR decay (I might change to OneCycle policy). But I wanted to know for which layers is the LR decay applied, below is the sample code:
optimizer = optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)
MultiStepLR(optimizer, milestones=[5,10], gamma=0.1)
Is the rate decay only applied to base layers or to both the classifier and base layers?