Error loading trained model

Fish · June 9, 2018, 11:13am

Hi, I followed the pytorch imagenet code to train a binary model, which has been saved as “model_best.pth.tar”. And I perform the following options to load the model.

	vgg = models.vgg19(pretrained=False)
	vgg.load_state_dict(torch.load('model_best.pth.tar'))
	cnn = vgg.features
	cnn = cnn.cuda()
	model = nn.Sequential()
	model = model.cuda()

But it gives me the error as “KeyError: 'unexpected key “arch” in state_dict”. I am not sure why “load_state_dict” cannot load the model as indicated. It appears that the original imagenet code saves “args.arch”. Thanks for the help.

ptrblck · June 9, 2018, 11:32am

Probably you are loading a checkpoint dict with different key value pairs, e.g. the state_dict, best accuracy etc.
Could you print the result of torch.load('model...')?

Fish · June 9, 2018, 11:39am

Thanks for your reply. I followed the imagenet code where the checkpoint is saved.

    save_checkpoint({
        'epoch': epoch + 1,
        'arch': args.arch,
        'state_dict': model.state_dict(),
        'best_prec1': best_prec1,
        'optimizer' : optimizer.state_dict(),
    }, is_best)

So the following should be the correct way to load the model.
cnn = models.vgg19()
cnn.features = torch.nn.DataParallel(cnn.features)
cnn.cuda()
checkpoint = torch.load(‘models/checkpoint.pth.tar’)
cnn.load_state_dict(checkpoint[‘state_dict’])

In this case, the loading works fine. But if I want to iterate the CNN,

for i,layer in enumerate(list(cnn)):

it gives me TypeError: ‘VGG’ object is not iterable. So I guess I am not loading the model the way I want?

ptrblck · June 9, 2018, 11:53am

Do you want to see all layers?
If so, you could iterate the .children() or .modules().
I’m answering on my mobile now, so I cannot check it, but I think you cannot iterate a model like this.

Fish · June 9, 2018, 12:20pm

It appears that the model keys are also inconsistent.

When you use the following codes:
cnn = models.vgg19()
cnn_state_dict = cnn.state_dict()
print(cnn_state_dict)
It gives you “features.0.weight”

And if you load a previously trained model,
checkpoint = torch.load(‘models/checkpoint.pth.tar’)
print(checkpoint[‘state_dict’].keys())
Then it will give you “KeyError: ‘unexpected key “features.module.0.weight” in state_dict’”. What are the solutions for this one?

deJQK · June 9, 2018, 1:50pm

I am facing the same error, for vgg 13 and 19. These two websites talks about this problem, but I have not yet tested on it. I am not sure about if this applies to vgg 13 also. BTW, vgg11 and 16 does not have this problem.

deJQK · June 10, 2018, 5:18pm

Today I retried the code, and there is no such bug. I do not know why. My code is something like this:

    model_list = ['vgg11_bn', 'vgg13_bn', 'vgg16_bn', 'vgg19_bn']
    for mdl_name in model_list:
        print(mdl_name)
        mdl_path = os.path.join(mdl_result_path, mdl_name)
        print(mdl_path)
        try:
            mdl_file = GetLatestFile(mdl_path)
            print('{} is loaded.'.format(mdl_file))
        except:
            print('The result for {} does not exist.'.format(mdl_name))
            continue

        mdl = pretrainedmodels.__dict__[mdl_name](num_classes=1000, pretrained='imagenet')

        ...

        mdl.load_state_dict(torch.load(mdl_file))

When there is the bug, it seems that the ‘try except’ part is not executed correctly, and I guess the mdl is still something from previous iteration (e.g. vgg11_bn). The output is as follows:

vgg13_bn
~/results/vgg13_bn
Traceback (most recent call last):
  File "InterValid.py", line 119, in <module>
    if __name__ == '__main__':
  File "InterValid.py", line 87, in main

  File "~/.conda/envs/myenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for VGG:

I change the model list to [‘vgg13_bn’], and now it works fine.

I have no idea why the try except is not executed correctly …

stoneyang · August 3, 2018, 12:21pm

Hi @Fish, how did you solve this problem? I’ve encountered one somewhat similar to yours. Could u please take a look at it? And suggestion would be great. Thanks!

Nowrin_Akter_Surovi · July 16, 2019, 5:23am

i got those errors. I can load torch.load(my model)