Two learning rate schedulers one optimizer

barakb · January 30, 2020, 2:26pm

Hi, I want to adjust the learning rate of part of my model, let’s call it PartA using lr_schedulerA
And PartB using lr_schedulerB.
I didn’t find a way to do this, the only solution I found is to duplicate my optimizer, and put the parameters of each part in the corresponding optimizer:

    optimizerA = torch.optim.SGD(parametersA, args.lr,
                                momentum=args.sgd_momentum,
                                weight_decay=args.weight_decay)

   optimizerB = torch.optim.SGD(parametersB, args.lr,
                                    momentum=args.sgd_momentum,
                                    weight_decay=args.weight_decay)

And then :

        lr_schedulerA = torch.optim.lr_scheduler.MultiStepLR(optimizerA,
                                                        milestones=[100, 150], last_epoch=args.start_epoch - 1)
        lr_schedulerB = torch.optim.lr_scheduler.ExponentialLR(optimizerB,gamma=0.99)

Anyone holds a better idea to share with me?

Thanks!

ptrblck · January 31, 2020, 12:01am

Your approach looks reasonable as you are not duplicating the optimizer but rather use two different optimizers and schedulers for different parts of the model.
I think this use case looks clean and is quite easy to understand so I would stick to it and not hack around param_groups of the optimizer to pass them to different schedulers.

NoobCoder · March 26, 2021, 3:39am

Hello,

I have two parts in my model let’s say ‘feature’ and ‘classifier’. In optimizer, I have defined two learning rates for these parts. I am not sure how I can set up the schedular for these two?

optim.SGD([
{‘params’: model.feature.parameters(), ‘lr’: 0.1},
{‘params’: model.classifier.parameters(), ‘lr’: 1}
], lr=1e-2, momentum=0.9)

I want to use OneCycleLR which needs lr in its parametrs. How should I pass two learning rates?

NoobCoder · March 26, 2021, 3:59am

Oh, that function accepts a list for each parameter group. I tested it, and it works fine.

Just one note that when you use OneCycleLR, the learning rate in the optimizer does not matter. What you put in max_lr determines your learning rate!

jameslahm · July 16, 2022, 3:07pm

@NoobCoder Hi, I encounter this problem too. Could you share how you set up the OneCycleLR for two different parameter groups with different learning rate? Thanks a lot!