I am training a model. To overcome overfitting I have done optimization, data augmentation etc etc. I have an updated LR (I tried for both SGD and Adam), and when there is a plateu (also tried step), the learning rate is decreased by a factor until it reaches LR 1e-08 but won’t go below than that and my model’s validation gets stuck after this point. I tried passing the epsilon parameter to Adam to suggest a smaller value, but it still got stuck at LR 1e-08. I also pass a weight decay, but it doesn’t change the situation. Neither did setting the amsgrad to true.
I did some research and people suggest that Adam optimizer has inherent problems but nothing is mentioned about the learning rate - and every discussion added that with SGD, there is no problem.
Why is this? Is it a bug or is it designed so because authors think it is meaninglessly a small value after that? It seems like it would really help to have a smaller learning rate for my dataset because all seems well up until learning rate is down to LR 1e-08.