Error loading trained model

Hi, I followed the pytorch imagenet code to train a binary model, which has been saved as “model_best.pth.tar”. And I perform the following options to load the model.

	vgg = models.vgg19(pretrained=False)
	cnn = vgg.features
	cnn = cnn.cuda()
	model = nn.Sequential()
	model = model.cuda()

But it gives me the error as “KeyError: 'unexpected key “arch” in state_dict”. I am not sure why “load_state_dict” cannot load the model as indicated. It appears that the original imagenet code saves “args.arch”. Thanks for the help.

Probably you are loading a checkpoint dict with different key value pairs, e.g. the state_dict, best accuracy etc.
Could you print the result of torch.load('model...')?

Thanks for your reply. I followed the imagenet code where the checkpoint is saved.

        'epoch': epoch + 1,
        'arch': args.arch,
        'state_dict': model.state_dict(),
        'best_prec1': best_prec1,
        'optimizer' : optimizer.state_dict(),
    }, is_best)

So the following should be the correct way to load the model.
cnn = models.vgg19()
cnn.features = torch.nn.DataParallel(cnn.features)
checkpoint = torch.load(‘models/checkpoint.pth.tar’)

In this case, the loading works fine. But if I want to iterate the CNN,

for i,layer in enumerate(list(cnn)):

it gives me TypeError: ‘VGG’ object is not iterable. So I guess I am not loading the model the way I want?

Do you want to see all layers?
If so, you could iterate the .children() or .modules().
I’m answering on my mobile now, so I cannot check it, but I think you cannot iterate a model like this.

It appears that the model keys are also inconsistent.

When you use the following codes:
cnn = models.vgg19()
cnn_state_dict = cnn.state_dict()
It gives you “features.0.weight”

And if you load a previously trained model,
checkpoint = torch.load(‘models/checkpoint.pth.tar’)
Then it will give you “KeyError: ‘unexpected key “features.module.0.weight” in state_dict’”. What are the solutions for this one?

I am facing the same error, for vgg 13 and 19. These two websites talks about this problem, but I have not yet tested on it. I am not sure about if this applies to vgg 13 also. BTW, vgg11 and 16 does not have this problem.

Today I retried the code, and there is no such bug. I do not know why. My code is something like this:

    model_list = ['vgg11_bn', 'vgg13_bn', 'vgg16_bn', 'vgg19_bn']
    for mdl_name in model_list:
        mdl_path = os.path.join(mdl_result_path, mdl_name)
            mdl_file = GetLatestFile(mdl_path)
            print('{} is loaded.'.format(mdl_file))
            print('The result for {} does not exist.'.format(mdl_name))

        mdl = pretrainedmodels.__dict__[mdl_name](num_classes=1000, pretrained='imagenet')



When there is the bug, it seems that the ‘try except’ part is not executed correctly, and I guess the mdl is still something from previous iteration (e.g. vgg11_bn). The output is as follows:

Traceback (most recent call last):
  File "", line 119, in <module>
    if __name__ == '__main__':
  File "", line 87, in main

  File "~/.conda/envs/myenv/lib/python3.6/site-packages/torch/nn/modules/", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for VGG:

I change the model list to [‘vgg13_bn’], and now it works fine.

I have no idea why the try except is not executed correctly …

