I am using the OneCycleLR
scheduler. It has a max_lr
parameter which is the:
Upper learning rate boundaries in the cycle for each parameter group.
I also have an Adam
optimizer. This optimizer also has a learning rate.
Will the optimizer’s learning rate be overwritten by the scheduler’s? How do they relate?
I guess the only point of specifying a learning rate in the optimizer is if you do not use any scheduler, in which the learning rate will be constant throughout the training stage, but I am not entirely sure.