Why doesn't the learning rate (LR) go below 1e-08 in pytorch?

I am training a model. To overcome overfitting I have done optimization, data augmentation etc etc. I have an updated LR (I tried for both SGD and Adam), and when there is a plateu (also tried step), the learning rate is decreased by a factor until it reaches LR 1e-08 but won’t go below than that and my model’s validation gets stuck after this point. I tried passing the epsilon parameter to Adam to suggest a smaller value, but it still got stuck at LR 1e-08. I also pass a weight decay, but it doesn’t change the situation. Neither did setting the amsgrad to true.

I did some research and people suggest that Adam optimizer has inherent problems but nothing is mentioned about the learning rate - and every discussion added that with SGD, there is no problem.

Why is this? Is it a bug or is it designed so because authors think it is meaninglessly a small value after that? It seems like it would really help to have a smaller learning rate for my dataset because all seems well up until learning rate is down to LR 1e-08.

Have you tried to set the eps argument of your ReduceLROnPlateau scheduler?
Adam’s eps is different to the one used in the scheduler.
From the docs:

eps ( float ) – Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is ignored. Default: 1e-8.

If you set it to a smaller value, the scheduler should continue decreasing the learning rate.

2 Likes

Thank you for your quick response, I have totally missed the epsilon parameter of ReduceLROnPlateau scheduler. I have tried it and it works! Thank you.

1 Like