Loss suddenly increases using Adam optimizer

While AMSGrad really improves the training loss curve and it seems to progress for a longer number of epochs, but after certain number of epochs, even AMSGrad tends to increase training loss

4 Likes