Saving and loading a model in Pytorch?

Hi, I’m trying to implement training with check points using the above ideas, so that I could resume training from say, Epoch k and re-train the model from Epoch k to N. Suppose I’ve saved the following into the model file and reloaded in resume training: epoch, model’s state_dict(), optimizer, but I’m not seen similar training results between the two ways:

  1. train the model from Epoch 1 to N.
  2. train the model from Epoch1 to k, save the model, and resume training starting from Epoch k to N.

I checked the learning rates to be consistent between 1) and 2), using SGD with the same momentum and weight decaying rates.

Any ideas where I should be looking into?
Thanks!

6 Likes