How to escape from a saturated loss landscape while using Adam optimizer

I’m training an AutoEncoder network with Adam optimizer (amsgrad=True). My loss decreases very rapidly initially but slowly it starts saturating.

How can I accelerate the learning to decrease my loss? A standard strategy is to decay the learning rate; but when I do this to Adam optimizer my loss jumps abruply.

I’m using pytorch’s scheduler.MultiStepLR() to decay my learning rate.

Below is my loss plot.

Did you find a solution for this problem?

Not even yet. I would be happy to know if anyone on the forum can experiments on this. I have also opened a thread on stack with more details.