Loss jumps abruptly whenever learning rate is decayed in Adam optimizer

Florian_1990 · September 28, 2018, 8:26am

Hi, I do not fully understand the problem, too. However here are some thoughts on your problem:

Your loss decays without explicit learning rate decay. Is there a particular reason you want to get learning rate decay working?
Adam uses adaptive learning rates intrinsically. I guess for many problems that should be good enough. You can read more on this in this discussion on Stackoverflow.
Adam (like many other common optimization algorithms) adapts to a specific machine learning problem by computing/estimating momenta. Creating a new optimizer every epoch therefor should degrade performance due to loss of the information
I feel like decreasing the learning rate by 75 % might be too high when using a momentum based optimizer. Would be interesting, if reducing the learning rate by something like 15–25 % gives better results.