Training loss decrease firstly but increase later

JethroJC · September 11, 2018, 7:09am

I trained LSTM-MDN model using Adam, the training loss decreased firstly, but after serveral hundreds epoch, it increased and higher than Initial value. Then I retrained from the point where the loss was lowest and reduced learning rate 10 times(from 1e-3 to 1e-4). The training loss alse decreased firstly and increased later.I initially thought there was some bugs in the code, but I didn’t find any bug. Then I replaced adam with SGD(momentum=0), training loss didn’t increased, but it converged to a relatively large value which was higher than the loss from adam, so I thougtht there was something wrong in adam.
I never found the reason, I hope someone can help me find the reason.Thansk!
loss (adam)

JethroJC · September 11, 2018, 7:10am

loss function

ptrblck · September 11, 2018, 7:34am

Do you zero out the gradients after each optimizer step?
I’ve seen similar behavior when the gradients were accidentally accumulated.

JethroJC · September 11, 2018, 12:52pm

I really appreciate you replying to me.Here is my code.Is there a wrong order between computing the loss and calling zero_grad? Thank you very much~~

JethroJC · September 12, 2018, 1:14am

When I replaced adam with SGD(momentum=0), training loss didn’t increase, but it converged to a relatively large value which was higher than the loss from adam.

ptrblck · September 12, 2018, 12:42pm

The order of your calls looks alright.
I’m unfortunately not really familiar with your use case and the loss function you are using.
Also, what is ensure_shared_grad doing? Is it just copying the current gradients to another model?

JethroJC · September 12, 2018, 1:19pm

Yes, it is. Thanks for your replying.I’m still looking for the reason.Maybe there is something wrong with adam?

sh0416 · August 7, 2019, 11:26pm

I also encounter this issue.
Is there any suggestion when we use adam optimizer?