Saving and loading a model in Pytorch?


#1

If I have a model class and a trainer class. I create an instance of the model and train it.

model = mymodel()
train = trainer.train(model...) 

How can I save the model to a file, after it has been trained and how can I then reload it and continue training? I searched for this but didn’t get an answer.


Restoring optimizer and model from saved state not fully reproducing training results?
How to load and run model?
Saved model have higher loss
Print current learning rate of the Adam Optimizer?
(Rinku Jadhav) #2

http://pytorch.org/docs/notes/serialization.html


Load/save model parameters
(Mamy Ratsimbazafy) #3

@Rinku_Jadhav2014 unfortunately that tutorial is incomplete to resume training. It will only allow saving a model but it does not save the optimizer, epochs, score, etc.

@Bixqu You can check the ImageNet Example line 139

        save_checkpoint({
            'epoch': epoch + 1,
            'arch': args.arch,
            'state_dict': model.state_dict(),
            'best_prec1': best_prec1,
            'optimizer' : optimizer.state_dict(),
        }, is_best)

With

def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, filename)
    if is_best:
        shutil.copyfile(filename, 'model_best.pth.tar')

Loading/Resuming from the dictionary is there

    if args.resume:
        if os.path.isfile(args.resume):
            print("=> loading checkpoint '{}'".format(args.resume))
            checkpoint = torch.load(args.resume)
            args.start_epoch = checkpoint['epoch']
            best_prec1 = checkpoint['best_prec1']
            model.load_state_dict(checkpoint['state_dict'])
            optimizer.load_state_dict(checkpoint['optimizer'])
            print("=> loaded checkpoint '{}' (epoch {})"
                  .format(args.resume, checkpoint['epoch']))
        else:
            print("=> no checkpoint found at '{}'".format(args.resume))

Saving and restoring model weights at the mini-batch level?
Pretrained loaded but the performance worse at beginning
How resume the saved trained model at specific epoch
Saving and restoring model weights at the mini-batch level?
How resume the saved trained model at specific epoch
(Diego Silva) #4

Hi! the best and safe way to save your model parameters is doing something like this:

  model = MyModel()
  # ... after training, save your model 
  model.save_state_dict('mytraining.pt')

  # .. to load your previously training model:
  model.load_state_dict(torch.load('mytraining.pt'))

Error in extracting features
(Mamy Ratsimbazafy) #5

@diegslva Unfortunately this has the same issue as the tutorial, it won’t save the epoch and the optimizer state so you can’t resume training which was the OP need.


(Diego Silva) #6

@mratsim, You’re right! I made a mistake here understanding the question.
I don’t use to do that but, maybe something dirty like that to save you entirely objects:

import copy
import pickle

# model stuff    
model = mymodel()
train = trainer.train(model...)

# copy you entirely object and save it 
saved_trainer = copy.deepcopy(train)
with open(r"my_trainer_object.pkl", "wb") as output_file:
    pickle.dump(saved_trainer, output_file)

#7

@mratsim & @diegslva, when I want to save the trained (i.e., fine tuned) models of ResNet and DenseNet the torch.save(MyModel.state_dict(), './model.pth') method doesn’t work correctly; and when I used the torch.save(MyModel, './model.pth') then the models are saved correctly. It means that when I load my saved models via the first approach, my models don’t give me correct results, however when I use the second approach the results are good. Am I correct? would you please explain why this issue occurred?


#8

when you load the model back again via state_dict method, remember to do MyModel.eval(), otherwise the results will differ.


(Kelly Zhang) #9

Why will the results differ without calling MyModel.eval()?


#10

because your BatchNorm or Dropout layers by default are in train mode on construction.


(Prasanna1991) #11

If my model doesn’t use such layers like dropout or batchnorm then it doesn’t make difference to use model or model.eval(), right?


(Naofumi Tomita) #12

You’re right. It matters only when you use those layers, as described in the document. In theory, BN/Dropout should behave differently in evaluation time so you need manually toggle the setting. You could alternatively use model.train(False). Also, make sure to use eval() at validation time.


#13

I use .eval() and incorrect either.


(Shuokai Pan) #14

HI guys,

I have a question about the behaviour of dropout layer during training and evaluation. I remember reading in a paper that because dropout leave out some units during training. During evaluation, the out going weights of dropout layer need to be reduced an amount corresponding to the dropout rate. For instance, if the dropout rate is 0.5, then the out-going weights need to be reduced by 2, because during evaluation, we effectively have twice the number of units.

So my question is, is this kind of weight scaling mechanism included in the dropout layer in pytorch as well?

Cheers and thanks a lot for your help.
Shuokai


(Anuvabh) #15

model.eval() takes care of this. However, I think it is scaling the activations and not the weights.


(Shuokai Pan) #16

Ok I understand. Thanks for the help.

Cheers


(Herleeyandi Markoni) #17

@smth Is it same for model.Train(False) with model.Eval() ?


(ProKil) #18

It is true that model.eval() takes care of this. However, it scales when training.

Furthermore, the outputs are scaled by a factor of 1/(1-p) during training. This means that during evaluation the module simply computes an identity function.


(gj) #19

Newbie question…
Any conventions for filename extensions for saving model and model weights with the following commands?

torch.save(the_model, PATH)

torch.save(the_model.state_dict(), PATH)


#20

we’ve been using .pth, but it’s pretty arbitrary