Saving and loading a model in Pytorch?

This is how I do it:

torch.save(net.state_dict(),model_save_path + '_.pth')

save_checkpoint({
          'epoch': epoch + 1,
          # 'arch': args.arch,
          'state_dict': net.state_dict(),
          'optimizer': optimizer.state_dict(),
        }, is_best, mPath ,  str(val_acc) + '_' + str(val_los) + "_" + str(epoch) + '_checkpoint.pth.tar')

Where:

def save_checkpoint(state, is_best, save_path, filename):
  filename = os.path.join(save_path, filename)
  torch.save(state, filename)
  if is_best:
    bestname = os.path.join(save_path, 'model_best.pth.tar')
    shutil.copyfile(filename, bestname)
1 Like

huh I’m confused, is torch.save(model, …) actually wrong and should we be using torch.save(model.state_dict(), …) instead?

No, not wrong, just a different approach. I.e., via the former, the whole object gets pickled, and via the latter, only its parameters get pickled. Since pickle can be quite of a mess when it comes to import dependencies, I would generally recommended the latter approach. Esp. if you are planning to run the model on a different machine.

1 Like

I find that my model accuracy drops a little bit when i load a saved checkpoint compared to before i saved the state.

Here is the dict im saving using torch.save(…)

save_checkpoint({
          'epoch': cur_epoch,
          'state_dict': model.state_dict(),
          'best_prec': best_prec,
          'loss_train': loss_train,  
          'optimizer': optimizer.state_dict(),
        }, is_best, OUT_DIR, 'acc-{:.4f}_loss-{:.4f}_epoch-{}_checkpoint.pth.tar'.format(val_acc, val_loss, cur_epoch))

And here is how i load a saved state:

def load_checkpoint(checkpoint, model, optimizer):
    """ loads state into model and optimizer and returns:
        epoch, best_precision, loss_train[]
    """
    if os.path.isfile(load_path):
        print("=> loading checkpoint '{}'".format(load_path))
        checkpoint = torch.load(load_path)
        epoch = checkpoint['epoch']
        best_prec = checkpoint['best_prec']
        loss_train = checkpoint['loss_train']
        model.load_state_dict(checkpoint['state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer'])
        print("=> loaded checkpoint '{}' (epoch {})"
              .format(epoch, checkpoint['epoch']))
        return epoch, best_prec, loss_train
    else:
        print("=> no checkpoint found at '{}'".format(load_path))
        # epoch, best_precision, loss_train
        return 1, 0, []

Can anybody spot what i am doing wrong?

1 Like

Should I care about the mode of model when saving the model via state_dict method?

Are you able to reproduce the results using this way? Don’t we need to store the optimizer states as well?

Don’t we need to store the optimizer states as well?

If your optimizer has internal parameters, e.g. Adam, then yes, you should also store it.

I’m forking the official word level language modelling. I can’t find an explicit optimizer here. All I’ve is a loss.backward(). I’m not able to reproduce the results by saving the model like explained here.

In this tutorial the weight updates are performed manually in this line of code.
Since you don’t have internal estimates you don’t have to store anything regarding the optimization.

I saved the model which performed the following graph.

On reproducing the results with the model saved, my test error is,

End of training | test loss 17.28 | test ppl 32104905.14

Generated result is gibberish.

Something is terribly wrong. I’m not sure where I should check :confused:

Something looks fishy. Could you create a new thread and post your complete issue there?
It would also be easier to debug, if you could post your code so that we can have a look.

1 Like

I started a new thread at here.

Hi! I have a problem with loading my model. I’m training VGG19 on cifar10 in colab, when I load it in colab it is OK but when I load it on my laptop with same code it gives error. They’re both python3 and trained with cuda.
Error:

Save code

def save_checkpoint(state, filename):
    torch.save({'state_dict': net.state_dict(),
                'optimizer': optimizer.state_dict(),
                }, filename)

Load

checkpoint = torch.load('./vgg19_200.pth')
net.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
1 Like

hello I’m trying to save my adam optimizer,

but why whenever I load it, the state_dict is always different
if I restart my environment?

I’ve also make my own thread here, Saved model have higher loss

thank you

no such thing as mistake or understand or not, think any is ok

Is there a way to save and load models from s3 directly?

Have you solved this problem? I have encountered this problem, I don’t know where it is wrong.

Yes, if you use StringIO you can create a file stream, write your model state to it, then push that to s3.

What I additionally do is use joblib to add compression and pickle after writing to the stream, push that to s3, then unload with joblib back to a file stream object and read the model state back into a model object to resume.

It’s not necessary, you can use .copy() it’ll work fine too. :slight_smile:

He just mean when you need to evaluate or infer.