If you trained your model using Adam, you need to save the optimizer state dict as well and reload that. Also, if you used any learning rate decay, you need to reload the state of the scheduler because it gets reset if you don’t, and you may end up with a higher learning rate that will make the solution state oscillate. Finally, if you have any dropout or batch norm in your model architecture, and you saved your model after a test loop (in which case model.eval()
was called), make sure to call model.train()
before the training loop.
12 Likes