I am using the OneCycleLR scheduler. It has a max_lr parameter which is the:
Upper learning rate boundaries in the cycle for each parameter group.
I also have an Adam optimizer. This optimizer also has a learning rate.
Will the optimizer’s learning rate be overwritten by the scheduler’s? How do they relate?
I guess the only point of specifying a learning rate in the optimizer is if you do not use any scheduler, in which the learning rate will be constant throughout the training stage, but I am not entirely sure.
Creating the scheduler OneCycleLR will change the learning rate of the optimizer. So, there is not really a point of specifying the learning rate in the optimizer if you use a scheduler, but the optimizer will have a learning rate by default when its created, but it will just be changed if you pass it to a scheduler.
To play devil’s advocate it seems that in the Pytorch docs they show the optimizer being given a different learning rate (0.1) from the max_lr (0.01) in the OneCycleLR scheduler.
It seems strange they would specify optimizer LR at all much less to a different value if it did nothing…