I’m training a simple classification model using Adam optimizer. I first trained the model to convergence, however, when I reload the model and start to train further, I found the loss climbs to a very high value. What’s wrong with my training? Every time I reload the model I found the loss at the very beginning is very large.
Do you resume the training with a small enough learning rate? The loss can easily increase if we are using the default learning rate value, because it’s very likely to overshoot the current local minima.
given that nothing has changed in your model, are you using the same dataset? Maybe you forgot to normalize/standardize your inputs?
It is fairly common for a neural net to converge and then diverge. That is why many people monitor the validation loss and either reduce the learning rate or stop training when the validation loss stops going down.
I’m using the same dataset, and I can see a dramatic loss increase after I stop training and restart training.
Are you also saving and reloading the optimizer? Adam uses adaptive learning rates, i.e. it keeps an internal state from which you should start when you resume training…
no, I didn’t save adam optimizer…
You can save and load the optimizer state in a similar way to the model state.
torch.save(optimizer.state_dict(), filename) optimizer.load_state_dict(torch.load(filename))