Learning rate warm-up with SGD

I am looking for a way to do Epoch warm-ups/ learning rate warmups with SGD, but I can’t find anything useful. The best thing I could find was this site: https://github.com/Tony-Y/pytorch_warmup but this is for ADAM and I am looking for a way to do this SGD. can someone point me in the right direction?

Have you tried using this package on other optimizers?
And if so, do you get an error? From the docs I cannot see that it’s only limited to Adam or any other optimizer besides this special warmup for RAdam.

Hi, I think you may miss this description of pytorch_warmup in the docs:

The warmup factor depends on Adam's `beta2` parameter for `RAdamWarmup` . Please see the original paper for the details.

The author says that the implementation depends on the parameter of Adam. So I think pytorch_warmup cannot be used for SGD optimizer. Maybe I will check the code for more details when I am free.

Refer my project to easily use learning rate warmup method!

The Untuned warmup methods rely on Adam’s beta parameter. But if you just use the standard linear or exponential warmup, where you specify the number of warmup sets, I believe this can be used with any optimiser.