Saving and loading a model in Pytorch?

Bixqu · May 3, 2017, 1:18pm

If I have a model class and a trainer class. I create an instance of the model and train it.

model = mymodel()
train = trainer.train(model...)

How can I save the model to a file, after it has been trained and how can I then reload it and continue training? I searched for this but didn’t get an answer.

Rinku_Jadhav2014 · May 3, 2017, 7:00pm

http://pytorch.org/docs/notes/serialization.html

mratsim · May 3, 2017, 8:30pm

@Rinku_Jadhav2014 unfortunately that tutorial is incomplete to resume training. It will only allow saving a model but it does not save the optimizer, epochs, score, etc.

@Bixqu You can check the ImageNet Example line 139

        save_checkpoint({
            'epoch': epoch + 1,
            'arch': args.arch,
            'state_dict': model.state_dict(),
            'best_prec1': best_prec1,
            'optimizer' : optimizer.state_dict(),
        }, is_best)

With

def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, filename)
    if is_best:
        shutil.copyfile(filename, 'model_best.pth.tar')

Loading/Resuming from the dictionary is there

    if args.resume:
        if os.path.isfile(args.resume):
            print("=> loading checkpoint '{}'".format(args.resume))
            checkpoint = torch.load(args.resume)
            args.start_epoch = checkpoint['epoch']
            best_prec1 = checkpoint['best_prec1']
            model.load_state_dict(checkpoint['state_dict'])
            optimizer.load_state_dict(checkpoint['optimizer'])
            print("=> loaded checkpoint '{}' (epoch {})"
                  .format(args.resume, checkpoint['epoch']))
        else:
            print("=> no checkpoint found at '{}'".format(args.resume))

diegslva · May 3, 2017, 11:09pm

Hi! the best and safe way to save your model parameters is doing something like this:

  model = MyModel()
  # ... after training, save your model 
  model.save_state_dict('mytraining.pt')

  # .. to load your previously training model:
  model.load_state_dict(torch.load('mytraining.pt'))

mratsim · May 4, 2017, 6:57am

@diegslva Unfortunately this has the same issue as the tutorial, it won’t save the epoch and the optimizer state so you can’t resume training which was the OP need.

diegslva · May 4, 2017, 12:15pm

@mratsim, You’re right! I made a mistake here understanding the question.
I don’t use to do that but, maybe something dirty like that to save you entirely objects:

import copy
import pickle

# model stuff    
model = mymodel()
train = trainer.train(model...)

# copy you entirely object and save it 
saved_trainer = copy.deepcopy(train)
with open(r"my_trainer_object.pkl", "wb") as output_file:
    pickle.dump(saved_trainer, output_file)

ahkarami · June 8, 2017, 11:46am

@mratsim & @diegslva, when I want to save the trained (i.e., fine tuned) models of ResNet and DenseNet the torch.save(MyModel.state_dict(), './model.pth') method doesn’t work correctly; and when I used the torch.save(MyModel, './model.pth') then the models are saved correctly. It means that when I load my saved models via the first approach, my models don’t give me correct results, however when I use the second approach the results are good. Am I correct? would you please explain why this issue occurred?

smth · June 22, 2017, 3:33am

when you load the model back again via state_dict method, remember to do MyModel.eval(), otherwise the results will differ.

kellywzhang · June 28, 2017, 6:15pm

Why will the results differ without calling MyModel.eval()?

smth · June 29, 2017, 4:24am

because your BatchNorm or Dropout layers by default are in train mode on construction.

Prasanna1991 · July 23, 2017, 8:33pm

If my model doesn’t use such layers like dropout or batchnorm then it doesn’t make difference to use model or model.eval(), right?

ntomita · August 1, 2017, 7:53pm

You’re right. It matters only when you use those layers, as described in the document. In theory, BN/Dropout should behave differently in evaluation time so you need manually toggle the setting. You could alternatively use model.train(False). Also, make sure to use eval() at validation time.

lan2720 · August 4, 2017, 8:20am

I use .eval() and incorrect either.

ShuokaiPan · August 4, 2017, 4:05pm

HI guys,

I have a question about the behaviour of dropout layer during training and evaluation. I remember reading in a paper that because dropout leave out some units during training. During evaluation, the out going weights of dropout layer need to be reduced an amount corresponding to the dropout rate. For instance, if the dropout rate is 0.5, then the out-going weights need to be reduced by 2, because during evaluation, we effectively have twice the number of units.

So my question is, is this kind of weight scaling mechanism included in the dropout layer in pytorch as well?

Cheers and thanks a lot for your help.
Shuokai

vabh · August 6, 2017, 3:49pm

model.eval() takes care of this. However, I think it is scaling the activations and not the weights.

ShuokaiPan · August 8, 2017, 9:24am

Ok I understand. Thanks for the help.

Cheers

herleeyandi · September 14, 2017, 7:38am

@smth Is it same for model.Train(False) with model.Eval() ?

Hao_Chu · September 17, 2017, 10:36am

It is true that model.eval() takes care of this. However, it scales when training.

Furthermore, the outputs are scaled by a factor of 1/(1-p) during training. This means that during evaluation the module simply computes an identity function.

J_Goldfarb · September 21, 2017, 2:36pm

Newbie question…
Any conventions for filename extensions for saving model and model weights with the following commands?

torch.save(the_model, PATH)

torch.save(the_model.state_dict(), PATH)

smth · September 21, 2017, 4:42pm

we’ve been using .pth, but it’s pretty arbitrary