Pytorch loss does not decrease

Peter_Ham · March 1, 2018, 10:14pm

I’m training a simple classification model using Adam optimizer. I first trained the model to convergence, however, when I reload the model and start to train further, I found the loss climbs to a very high value. What’s wrong with my training? Every time I reload the model I found the loss at the very beginning is very large.

yf225 · March 1, 2018, 11:06pm

Do you resume the training with a small enough learning rate? The loss can easily increase if we are using the default learning rate value, because it’s very likely to overshoot the current local minima.

rasbt · March 2, 2018, 1:53am

given that nothing has changed in your model, are you using the same dataset? Maybe you forgot to normalize/standardize your inputs?

jpeg729 · March 2, 2018, 7:36am

It is fairly common for a neural net to converge and then diverge. That is why many people monitor the validation loss and either reduce the learning rate or stop training when the validation loss stops going down.

Peter_Ham · March 2, 2018, 8:40am

I’m using the same dataset, and I can see a dramatic loss increase after I stop training and restart training.

dpernes · March 2, 2018, 11:47am

Are you also saving and reloading the optimizer? Adam uses adaptive learning rates, i.e. it keeps an internal state from which you should start when you resume training…

Peter_Ham · March 6, 2018, 8:06am

no, I didn’t save adam optimizer…

jpeg729 · March 6, 2018, 8:12am

You can save and load the optimizer state in a similar way to the model state.

torch.save(optimizer.state_dict(), filename)
optimizer.load_state_dict(torch.load(filename))