How to save a model from a previous epoch?

The current method of saving a model seems to be this: https://cs230-stanford.github.io/pytorch-getting-started.html#saving-and-loading-models

What if I trained a model for 50 epochs, but notice the model starts to overfit at the 40th epoch. How can I save/load the model weights at the 40th epoch?

2 Likes

You can call torch.save() multiple times in your training routine and save the model’s data in different output files.

Recommended approach

path = os.path.join(SAVE_DIR, 'model.pth')
torch.save(MODEL.cpu().state_dict(), path) # saving model
MODEL.cuda() # moving model to GPU for further training
1 Like

Does torch.save() overwrite the previous saved model, or can I save multiple models?

If you save it into another buffer or file it will not overwrite the previous one.

So if you follow the recommended approach @alwynmathew mentioned, you can for example use the number of the current epoch in the filename.

Example:
model is the model to save
epoch is the counter counting the epochs
model_dir is the directory where you want to save your models in
For example you can call this for example every five or ten epochs.

torch.save(model.state_dict(), os.path.join(model_dir, 'epoch-{}.pt'.format(epoch)))

So if I call that function at a certain epoch (say every 10), it will save it as a new file under epoch-number.pt?

Thanks!

Small correction on @mteser answer:

torch.save(model.state_dict(), os.path.join(model_dir, 'epoch-{}.pth'.format(epoch)))
1 Like

I also found examples in the documentation which use .pt instead of .pth, for example here, but also some that use .pth in examples.
It is just a name, but is somewhere one file-suffix explicitely recommended?

You are right @mteser, it doesn’t matter. link