Loss jumps abruptly whenever learning rate is decayed in Adam optimizer

Hi, I do not fully understand the problem, too. However here are some thoughts on your problem:

  • Your loss decays without explicit learning rate decay. Is there a particular reason you want to get learning rate decay working?
  • Adam uses adaptive learning rates intrinsically. I guess for many problems that should be good enough. You can read more on this in this discussion on Stackoverflow.
  • Adam (like many other common optimization algorithms) adapts to a specific machine learning problem by computing/estimating momenta. Creating a new optimizer every epoch therefor should degrade performance due to loss of the information
  • I feel like decreasing the learning rate by 75 % might be too high when using a momentum based optimizer. Would be interesting, if reducing the learning rate by something like 15–25 % gives better results.