Saving and loading a model in Pytorch?

http://pytorch.org/docs/notes/serialization.html

@Rinku_Jadhav2014 unfortunately that tutorial is incomplete to resume training. It will only allow saving a model but it does not save the optimizer, epochs, score, etc.

@Bixqu You can check the ImageNet Example line 139

        save_checkpoint({
            'epoch': epoch + 1,
            'arch': args.arch,
            'state_dict': model.state_dict(),
            'best_prec1': best_prec1,
            'optimizer' : optimizer.state_dict(),
        }, is_best)

With

def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, filename)
    if is_best:
        shutil.copyfile(filename, 'model_best.pth.tar')

Loading/Resuming from the dictionary is there

    if args.resume:
        if os.path.isfile(args.resume):
            print("=> loading checkpoint '{}'".format(args.resume))
            checkpoint = torch.load(args.resume)
            args.start_epoch = checkpoint['epoch']
            best_prec1 = checkpoint['best_prec1']
            model.load_state_dict(checkpoint['state_dict'])
            optimizer.load_state_dict(checkpoint['optimizer'])
            print("=> loaded checkpoint '{}' (epoch {})"
                  .format(args.resume, checkpoint['epoch']))
        else:
            print("=> no checkpoint found at '{}'".format(args.resume))
92 Likes

Hi! the best and safe way to save your model parameters is doing something like this:

  model = MyModel()
  # ... after training, save your model 
  model.save_state_dict('mytraining.pt')

  # .. to load your previously training model:
  model.load_state_dict(torch.load('mytraining.pt'))
5 Likes

@diegslva Unfortunately this has the same issue as the tutorial, it won’t save the epoch and the optimizer state so you can’t resume training which was the OP need.

8 Likes

@mratsim, You’re right! I made a mistake here understanding the question.
I don’t use to do that but, maybe something dirty like that to save you entirely objects:

import copy
import pickle

# model stuff    
model = mymodel()
train = trainer.train(model...)

# copy you entirely object and save it 
saved_trainer = copy.deepcopy(train)
with open(r"my_trainer_object.pkl", "wb") as output_file:
    pickle.dump(saved_trainer, output_file)
3 Likes

@mratsim & @diegslva, when I want to save the trained (i.e., fine tuned) models of ResNet and DenseNet the torch.save(MyModel.state_dict(), './model.pth') method doesn’t work correctly; and when I used the torch.save(MyModel, './model.pth') then the models are saved correctly. It means that when I load my saved models via the first approach, my models don’t give me correct results, however when I use the second approach the results are good. Am I correct? would you please explain why this issue occurred?

5 Likes

when you load the model back again via state_dict method, remember to do MyModel.eval(), otherwise the results will differ.

35 Likes

Why will the results differ without calling MyModel.eval()?

4 Likes

because your BatchNorm or Dropout layers by default are in train mode on construction.

32 Likes

If my model doesn’t use such layers like dropout or batchnorm then it doesn’t make difference to use model or model.eval(), right?

2 Likes

You’re right. It matters only when you use those layers, as described in the document. In theory, BN/Dropout should behave differently in evaluation time so you need manually toggle the setting. You could alternatively use model.train(False). Also, make sure to use eval() at validation time.

4 Likes

I use .eval() and incorrect either.

1 Like

HI guys,

I have a question about the behaviour of dropout layer during training and evaluation. I remember reading in a paper that because dropout leave out some units during training. During evaluation, the out going weights of dropout layer need to be reduced an amount corresponding to the dropout rate. For instance, if the dropout rate is 0.5, then the out-going weights need to be reduced by 2, because during evaluation, we effectively have twice the number of units.

So my question is, is this kind of weight scaling mechanism included in the dropout layer in pytorch as well?

Cheers and thanks a lot for your help.
Shuokai

1 Like

model.eval() takes care of this. However, I think it is scaling the activations and not the weights.

2 Likes

Ok I understand. Thanks for the help.

Cheers

@smth Is it same for model.Train(False) with model.Eval() ?

It is true that model.eval() takes care of this. However, it scales when training.

Furthermore, the outputs are scaled by a factor of 1/(1-p) during training. This means that during evaluation the module simply computes an identity function.

3 Likes

Newbie question…
Any conventions for filename extensions for saving model and model weights with the following commands?

torch.save(the_model, PATH)

torch.save(the_model.state_dict(), PATH)

2 Likes

we’ve been using .pth, but it’s pretty arbitrary

Hi, I’m trying to implement training with check points using the above ideas, so that I could resume training from say, Epoch k and re-train the model from Epoch k to N. Suppose I’ve saved the following into the model file and reloaded in resume training: epoch, model’s state_dict(), optimizer, but I’m not seen similar training results between the two ways:

  1. train the model from Epoch 1 to N.
  2. train the model from Epoch1 to k, save the model, and resume training starting from Epoch k to N.

I checked the learning rates to be consistent between 1) and 2), using SGD with the same momentum and weight decaying rates.

Any ideas where I should be looking into?
Thanks!

6 Likes