Learning rate warm-up with SGD

gamebo5000 · July 16, 2020, 5:20pm

I am looking for a way to do Epoch warm-ups/ learning rate warmups with SGD, but I can’t find anything useful. The best thing I could find was this site: https://github.com/Tony-Y/pytorch_warmup but this is for ADAM and I am looking for a way to do this SGD. can someone point me in the right direction?

ptrblck · July 19, 2020, 4:53am

Have you tried using this package on other optimizers?
And if so, do you get an error? From the docs I cannot see that it’s only limited to Adam or any other optimizer besides this special warmup for RAdam.

slyviacassell · January 22, 2021, 3:57am

Hi, I think you may miss this description of pytorch_warmup in the docs:

The warmup factor depends on Adam's `beta2` parameter for `RAdamWarmup` . Please see the original paper for the details.

The author says that the implementation depends on the parameter of Adam. So I think pytorch_warmup cannot be used for SGD optimizer. Maybe I will check the code for more details when I am free.

developer0hye · February 14, 2022, 1:07pm

Refer my project to easily use learning rate warmup method!

Mole_m7b5 · April 17, 2023, 1:37pm

The Untuned warmup methods rely on Adam’s beta parameter. But if you just use the standard linear or exponential warmup, where you specify the number of warmup sets, I believe this can be used with any optimiser.