Resume training with saved model

Hi everyone :slight_smile:

I have a general question regarding saving and loading models in PyTorch.

My case:

  1. I save a checkpoint consisting of the model.state_dict, optimizer.state_dict, and the last epoch.
  2. The saved checkpoint refers to the best performing model, evaluated by accuracy.
  3. I load all the three checkpoint entries and resume…However, I do not want to continue training but I want to use the saved state and make one forward pass to get the same accuracy as I had when I saved the checkpoint. How can I do that?

Basically, I want to be able to reproduce my results, since I have not figured out how to seed in PyTorch. It somehow does not really work…So I figured I can do it by saving and loading models.

Any help is very much appreciated!

All the best,
snowe

Hi,

first of all here you can find a short documentation about reproducibility in PyTorch.
I think your results could be reproduced, if you’re able to load the same data as in the original evaluation (e.g. your validation dataset) and you haven’t used any random operations in your first evaluation, like Dropout as an example.
How different are your current results from the original one?

Hi @Caruso,

thank you for reaching out! :slight_smile:

I am aware of the different seeding functions in PyTorch and I have used them. Somehow my results differ in the range of roughly 1 - 2%. My model does not use Dropout…

All the best,
snowe

UPDATE
I was able to make my results reproducible by using the following code:

def set_seed(seed):
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed) 
    # for cuda
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.enabled = False

I call the set_seed function with every run I do. Tbh, I don’t fully understand why I cannot just set the seed once, but it works :wink: