I am looking for a way to do Epoch warm-ups/ learning rate warmups with SGD, but I can’t find anything useful. The best thing I could find was this site: https://github.com/Tony-Y/pytorch_warmup but this is for ADAM and I am looking for a way to do this SGD. can someone point me in the right direction?
Have you tried using this package on other optimizers?
And if so, do you get an error? From the docs I cannot see that it’s only limited to
Adam or any other optimizer besides this special warmup for RAdam.
Hi, I think you may miss this description of pytorch_warmup in the docs:
The warmup factor depends on Adam's `beta2` parameter for `RAdamWarmup` . Please see the original paper for the details.
The author says that the implementation depends on the parameter of Adam. So I think pytorch_warmup cannot be used for SGD optimizer. Maybe I will check the code for more details when I am free.
Refer my project to easily use learning rate warmup method!