[solved] KeyError: 'unexpected key "module.encoder.embedding.weight" in state_dict'

I am having the same problem, and using the trick with OrderedDict does not work. I am using pytorch 0.3 in the case anything has changed.

I have an word embedding layer that was trained along with the classification task. Training was successful, but loading the model gave the error

Traceback (most recent call last): File "source/test.py", line 72, in <module> helper.load_model_states_from_checkpoint(model, args.save_path + 'model_best.pth.tar', 'state_dict', args.cuda) File "/u/flashscratch/flashscratch1/d/datduong/universalSentenceEncoder/source/helper.py", line 55, in load_model_states_from_checkpoint model.load_state_dict(checkpoint[tag]) File "/u/home/d/datduong/project/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 490, in load_state_dict .format(name)) KeyError: 'unexpected key "embedding.embedding.embedding.weight" in state_dict'

The key embedding.embedding.embedding.weight exists (see image). Please let me know what to do.

In my opinion, this question-answer should be in something FAQ :slight_smile:

11 Likes

Check out your saved model file:

check_point = torch.load('myfile.pth.tar')
check_point.key()

You may find out your ‘check_point’ got several keys such as ‘state_dict’ etc.

checkpoint = torch.load(resume)
state_dict =checkpoint['state_dict']

from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
    name = k[7:] # remove 'module.' of dataparallel
    new_state_dict[name]=v

model.load_state_dict(new_state_dict)
7 Likes

What about nn.DistributedDataParallel, it seems DistributedDataParallel and DataParallel can load each other’s parameters.
Is there an official way to save/load among DDP/DP/None?

just do this:

model = torch.load(train_model)

net.load_state_dict(model[‘state_dict’])

it works for me!

Thanks a lot, this worked for me.

Instead of deleting the “module.” string from all the state_dict keys, you can save your model with:
torch.save(model.module.state_dict(), path_to_file)
instead of
torch.save(model.state_dict(), path_to_file)
that way you don’t get the “module.” string to begin with…

15 Likes

Thanks for your hints! It saved my time:rose:

that’s work simple and perfect for me! thanks

1 Like

In case someone needs, this function can handle loading weights w/ and w/o ‘module’.

To save model without ‘module’, you may try this.

1 Like

After pytorch 1.xx
this was fixed, now you only need to do this

            if isinstance(args.pretrained, torch.nn.DataParallel):
                args.pretrained = args.pretrained.module

@weblucas

What is args in this case? Is it project-specific?

1 Like

This code that I am using saves the model using torch.save(model)… in this case the model is load using args.pretrained = torch.load(args.pretrained)
when it is a single gpu. model is one of my models MyModelNet(nn.Module), but in the multi gpu case it is nn.DataParallel(MyModelNet(nn.Module))

Ok so that wouldn’t really fix the loading problem but will help saving the correct state_dict() depending on whether your model is parallelized or not.

A more graceful solution is:

name = k.replace(".module", “”) # removing ‘.moldule’ from key

As for me using the k[7:] wasn’t properly removing the ‘module’.

2 Likes

@fmassa

I used your code to remove unexpected keys, however I can not get out from this error.
I tried all other tricks also.

Please give some hints to solve it. It will be very appreciated.

state_dict = torch.load("/media/Data/jcl-vb/output_dir/model_0000050.pth")
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
name = k[9:] # remove module.
new_state_dict[name] = v
model.load_state_dict(new_state_dict)
‘’

However, it shows the error like:

RuntimeError: Error(s) in loading state_dict for GeneralizedRCNN:
Missing key(s) in state_dict: “backbone.body.stem.conv1.weight”, “backbone.body.stem.bn1.weight”, “backbone.body.stem.bn1.bias”, “backbone.body.stem.bn1.running_mean”, “backbone.body.stem.bn1.running_var”, “backbone.body.layer1.0.downsample.0.weight”, “backbone.body.layer1.0.downsample.1.weight”, “backbone.body.layer1.0.downsample.1.bias”, “backbone.body.layer1.0.downsample.1.running_mean”, “backbone.body.layer1.0.downsample.1.running_var”, “backbone.body.layer1.0.conv1.weight”, “backbone.body.layer1.0.bn1.weight”, “backbone.body.layer1.0.bn1.bias”, “backbone.body.layer1.0.bn1.running_mean”, “backbone.body.layer1.0.bn1.running_var”, “backbone.body.layer1.0.conv2.weight”, “backbone.body.layer1.0.bn2.weight”, “backbone.body.layer1.0.bn2.bias”, “backbone.body.layer1.0.bn2.running_mean”, “backbone.body.layer1.0.bn2.running_var”, “backbone.body.layer1.0.conv3.weight”, “backbone.body.layer1.0.bn3.weight”, “backbone.body.layer1.0.bn3.bias”, “backbone.body.layer1.0.bn3.running_mean”, “backbone.body.layer1.0.bn3.running_var”, “backbone.body.layer1.1.conv1.weight”, “backbone.body.layer1.1.bn1.weight”, “backbone.body.layer1.1.bn1.bias”, “backbone.body.layer1.1.bn1.running_mean”, “backbone.body.layer1.1.bn1.running_var”, “backbone.body.layer1.1.conv2.weight”, “backbone.body.layer1.1.bn2.weight”, “backbone.body.layer1.1.bn2.bias”, “backbone.body.layer1.1.bn2.running_mean”, “backbone.body.layer1.1.bn2.running_var”, “backbone.body.layer1.1.conv3.weight”, “backbone.body.layer1.1.bn3.weight”, “backbone.body.layer1.1.bn3.bias”, “backbone.body.layer1.1.bn3.running_mean”, “backbone.body.layer1.1.bn3.running_var”, “backbone.body.layer1.2.conv1.weight”, “backbone.body.layer1.2.bn1.weight”, “backbone.body.layer1.2.bn1.bias”, “backbone.body.layer1.2.bn1.running_mean”, “backbone.body.layer1.2.bn1.running_var”, “backbone.body.layer1.2.conv2.weight”, “backbone.body.layer1.2.bn2.weight”, “backbone.body.layer1.2.bn2.bias”, “backbone.body.layer1.2.bn2.running_mean”, “backbone.body.layer1.2.bn2.running_var”, “backbone.body.layer1.2.conv3.weight”, “backbone.body.layer1.2.bn3.weight”, “backbone.body.layer1.2.bn3.bias”, “backbone.body.layer1.2.bn3.running_mean”, “backbone.body.layer1.2.bn3.running_var”, “backbone.body.layer2.0.downsample.0.weight”, “backbone.body.layer2.0.downsample.1.weight”, “backbone.body.layer2.0.downsample.1.bias”, “backbone.body.layer2.0.downsample.1.running_mean”, “backbone.body.layer2.0.downsample.1.running_var”, “backbone.body.layer2.0.conv1.weight”, “backbone.body.layer2.0.bn1.weight”, “backbone.body.layer2.0.bn1.bias”, “backbone.body.layer2.0.bn1.running_mean”, “backbone.body.layer2.0.bn1.running_var”, “backbone.body.layer2.0.conv2.weight”, “backbone.body.layer2.0.bn2.weight”, “backbone.body.layer2.0.bn2.bias”, “backbone.body.layer2.0.bn2.running_mean”, “backbone.body.layer2.0.bn2.running_var”, “backbone.body.layer2.0.conv3.weight”, “backbone.body.layer2.0.bn3.weight”, “backbone.body.layer2.0.bn3.bias”, “backbone.body.layer2.0.bn3.running_mean”, “backbone.body.layer2.0.bn3.running_var”, “backbone.body.layer2.1.conv1.weight”, “backbone.body.layer2.1.bn1.weight”, “backbone.body.layer2.1.bn1.bias”, “backbone.body.layer2.1.bn1.running_mean”, “backbone.body.layer2.1.bn1.running_var”, “backbone.body.layer2.1.conv2.weight”, “backbone.body.layer2.1.bn2.weight”, “backbone.body.layer2.1.bn2.bias”, “backbone.body.layer2.1.bn2.running_mean”, “backbone.body.layer2.1.bn2.running_var”, “backbone.body.layer2.1.conv3.weight”, “backbone.body.layer2.1.bn3.weight”, “backbone.body.layer2.1.bn3.bias”, “backbone.body.layer2.1.bn3.running_mean”, “backbone.body.layer2.1.bn3.running_var”, “backbone.body.layer2.2.conv1.weight”, “backbone.body.layer2.2.bn1.weight”, “backbone.body.layer2.2.bn1.bias”, “backbone.body.layer2.2.bn1.running_mean”, “backbone.body.layer2.2.bn1.running_var”, “backbone.body.layer2.2.conv2.weight”, “backbone.body.layer2.2.bn2.weight”, “backbone.body.layer2.2.bn2.bias”, “backbone.body.layer2.2.bn2.running_mean”, “backbone.body.layer2.2.bn2.running_var”, “backbone.body.layer2.2.conv3.weight”, “backbone.body.layer2.2.bn3.weight”, “backbone.body.layer2.2.bn3.bias”, “backbone.body.layer2.2.bn3.running_mean”, “backbone.body.layer2.2.bn3.running_var”, “backbone.body.layer2.3.conv1.weight”, “backbone.body.layer2.3.bn1.weight”, “backbone.body.layer2.3.bn1.bias”, “backbone.body.layer2.3.bn1.running_mean”, “backbone.body.layer2.3.bn1.running_var”, “backbone.body.layer2.3.conv2.weight”, “backbone.body.layer2.3.bn2.weight”, “backbone.body.layer2.3.bn2.bias”, “backbone.body.layer2.3.bn2.running_mean”, “backbone.body.layer2.3.bn2.running_var”, “backbone.body.layer2.3.conv3.weight”, “backbone.body.layer2.3.bn3.weight”, “backbone.body.layer2.3.bn3.bias”, “backbone.body.layer2.3.bn3.running_mean”, “backbone.body.layer2.3.bn3.running_var”, “backbone.body.layer3.0.downsample.0.weight”, “backbone.body.layer3.0.downsample.1.weight”, “backbone.body.layer3.0.downsample.1.bias”, “backbone.body.layer3.0.downsample.1.running_mean”, “backbone.body.layer3.0.downsample.1.running_var”, “backbone.body.layer3.0.conv1.weight”, “backbone.body.layer3.0.bn1.weight”, “backbone.body.layer3.0.bn1.bias”, “backbone.body.layer3.0.bn1.running_mean”, “backbone.body.layer3.0.bn1.running_var”, “backbone.body.layer3.0.conv2.weight”, “backbone.body.layer3.0.bn2.weight”, “backbone.body.layer3.0.bn2.bias”, “backbone.body.layer3.0.bn2.running_mean”, “backbone.body.layer3.0.bn2.running_var”, “backbone.body.layer3.0.conv3.weight”, “backbone.body.layer3.0.bn3.weight”, “backbone.body.layer3.0.bn3.bias”, “backbone.body.layer3.0.bn3.running_mean”, “backbone.body.layer3.0.bn3.running_var”, “backbone.body.layer3.1.conv1.weight”, “backbone.body.layer3.1.bn1.weight”, “backbone.body.layer3.1.bn1.bias”, “backbone.body.layer3.1.bn1.running_mean”, “backbone.body.layer3.1.bn1.running_var”, “backbone.body.layer3.1.conv2.weight”, “backbone.body.layer3.1.bn2.weight”, “backbone.body.layer3.1.bn2.bias”, “backbone.body.layer3.1.bn2.running_mean”, “backbone.body.layer3.1.bn2.running_var”, “backbone.body.layer3.1.conv3.weight”, “backbone.body.layer3.1.bn3.weight”, “backbone.body.layer3.1.bn3.bias”, “backbone.body.layer3.1.bn3.running_mean”, “backbone.body.layer3.1.bn3.running_var”, “backbone.body.layer3.2.conv1.weight”, “backbone.body.layer3.2.bn1.weight”, “backbone.body.layer3.2.bn1.bias”, “backbone.body.layer3.2.bn1.running_mean”, “backbone.body.layer3.2.bn1.running_var”, “backbone.body.layer3.2.conv2.weight”, “backbone.body.layer3.2.bn2.weight”, “backbone.body.layer3.2.bn2.bias”, “backbone.body.layer3.2.bn2.running_mean”, “backbone.body.layer3.2.bn2.running_var”, “backbone.body.layer3.2.conv3.weight”, “backbone.body.layer3.2.bn3.weight”, “backbone.body.layer3.2.bn3.bias”, “backbone.body.layer3.2.bn3.running_mean”, “backbone.body.layer3.2.bn3.running_var”, “backbone.body.layer3.3.conv1.weight”, “backbone.body.layer3.3.bn1.weight”, “backbone.body.layer3.3.bn1.bias”, “backbone.body.layer3.3.bn1.running_mean”, “backbone.body.layer3.3.bn1.running_var”, “backbone.body.layer3.3.conv2.weight”, “backbone.body.layer3.3.bn2.weight”, “backbone.body.layer3.3.bn2.bias”, “backbone.body.layer3.3.bn2.running_mean”, “backbone.body.layer3.3.bn2.running_var”, “backbone.body.layer3.3.conv3.weight”, “backbone.body.layer3.3.bn3.weight”, “backbone.body.layer3.3.bn3.bias”, “backbone.body.layer3.3.bn3.running_mean”, “backbone.body.layer3.3.bn3.running_var”, “backbone.body.layer3.4.conv1.weight”, “backbone.body.layer3.4.bn1.weight”, “backbone.body.layer3.4.bn1.bias”, “backbone.body.layer3.4.bn1.running_mean”, “backbone.body.layer3.4.bn1.running_var”, “backbone.body.layer3.4.conv2.weight”, “backbone.body.layer3.4.bn2.weight”, “backbone.body.layer3.4.bn2.bias”, “backbone.body.layer3.4.bn2.running_mean”, “backbone.body.layer3.4.bn2.running_var”, “backbone.body.layer3.4.conv3.weight”, “backbone.body.layer3.4.bn3.weight”, “backbone.body.layer3.4.bn3.bias”, “backbone.body.layer3.4.bn3.running_mean”, “backbone.body.layer3.4.bn3.running_var”, “backbone.body.layer3.5.conv1.weight”, “backbone.body.layer3.5.bn1.weight”, “backbone.body.layer3.5.bn1.bias”, “backbone.body.layer3.5.bn1.running_mean”, “backbone.body.layer3.5.bn1.running_var”, “backbone.body.layer3.5.conv2.weight”, “backbone.body.layer3.5.bn2.weight”, “backbone.body.layer3.5.bn2.bias”, “backbone.body.layer3.5.bn2.running_mean”, “backbone.body.layer3.5.bn2.running_var”, “backbone.body.layer3.5.conv3.weight”, “backbone.body.layer3.5.bn3.weight”, “backbone.body.layer3.5.bn3.bias”, “backbone.body.layer3.5.bn3.running_mean”, “backbone.body.layer3.5.bn3.running_var”, “backbone.body.layer4.0.downsample.0.weight”, “backbone.body.layer4.0.downsample.1.weight”, “backbone.body.layer4.0.downsample.1.bias”, “backbone.body.layer4.0.downsample.1.running_mean”, “backbone.body.layer4.0.downsample.1.running_var”, “backbone.body.layer4.0.conv1.weight”, “backbone.body.layer4.0.bn1.weight”, “backbone.body.layer4.0.bn1.bias”, “backbone.body.layer4.0.bn1.running_mean”, “backbone.body.layer4.0.bn1.running_var”, “backbone.body.layer4.0.conv2.weight”, “backbone.body.layer4.0.bn2.weight”, “backbone.body.layer4.0.bn2.bias”, “backbone.body.layer4.0.bn2.running_mean”, “backbone.body.layer4.0.bn2.running_var”, “backbone.body.layer4.0.conv3.weight”, “backbone.body.layer4.0.bn3.weight”, “backbone.body.layer4.0.bn3.bias”, “backbone.body.layer4.0.bn3.running_mean”, “backbone.body.layer4.0.bn3.running_var”, “backbone.body.layer4.1.conv1.weight”, “backbone.body.layer4.1.bn1.weight”, “backbone.body.layer4.1.bn1.bias”, “backbone.body.layer4.1.bn1.running_mean”, “backbone.body.layer4.1.bn1.running_var”, “backbone.body.layer4.1.conv2.weight”, “backbone.body.layer4.1.bn2.weight”, “backbone.body.layer4.1.bn2.bias”, “backbone.body.layer4.1.bn2.running_mean”, “backbone.body.layer4.1.bn2.running_var”, “backbone.body.layer4.1.conv3.weight”, “backbone.body.layer4.1.bn3.weight”, “backbone.body.layer4.1.bn3.bias”, “backbone.body.layer4.1.bn3.running_mean”, “backbone.body.layer4.1.bn3.running_var”, “backbone.body.layer4.2.conv1.weight”, “backbone.body.layer4.2.bn1.weight”, “backbone.body.layer4.2.bn1.bias”, “backbone.body.layer4.2.bn1.running_mean”, “backbone.body.layer4.2.bn1.running_var”, “backbone.body.layer4.2.conv2.weight”, “backbone.body.layer4.2.bn2.weight”, “backbone.body.layer4.2.bn2.bias”, “backbone.body.layer4.2.bn2.running_mean”, “backbone.body.layer4.2.bn2.running_var”, “backbone.body.layer4.2.conv3.weight”, “backbone.body.layer4.2.bn3.weight”, “backbone.body.layer4.2.bn3.bias”, “backbone.body.layer4.2.bn3.running_mean”, “backbone.body.layer4.2.bn3.running_var”, “backbone.fpn.fpn_inner1.weight”, “backbone.fpn.fpn_inner1.bias”, “backbone.fpn.fpn_layer1.weight”, “backbone.fpn.fpn_layer1.bias”, “backbone.fpn.fpn_inner2.weight”, “backbone.fpn.fpn_inner2.bias”, “backbone.fpn.fpn_layer2.weight”, “backbone.fpn.fpn_layer2.bias”, “backbone.fpn.fpn_inner3.weight”, “backbone.fpn.fpn_inner3.bias”, “backbone.fpn.fpn_layer3.weight”, “backbone.fpn.fpn_layer3.bias”, “backbone.fpn.fpn_inner4.weight”, “backbone.fpn.fpn_inner4.bias”, “backbone.fpn.fpn_layer4.weight”, “backbone.fpn.fpn_layer4.bias”, “rpn.anchor_generator.cell_anchors.0”, “rpn.anchor_generator.cell_anchors.1”, “rpn.anchor_generator.cell_anchors.2”, “rpn.anchor_generator.cell_anchors.3”, “rpn.anchor_generator.cell_anchors.4”, “rpn.head.conv.weight”, “rpn.head.conv.bias”, “rpn.head.cls_logits.weight”, “rpn.head.cls_logits.bias”, “rpn.head.bbox_pred.weight”, “rpn.head.bbox_pred.bias”, “roi_heads.box.feature_extractor.fc6.weight”, “roi_heads.box.feature_extractor.fc6.bias”, “roi_heads.box.feature_extractor.fc7.weight”, “roi_heads.box.feature_extractor.fc7.bias”, “roi_heads.box.predictor.cls_score.weight”, “roi_heads.box.predictor.cls_score.bias”, “roi_heads.box.predictor.bbox_pred.weight”, “roi_heads.box.predictor.bbox_pred.bias”.
Unexpected key(s) in state_dict: “”.

1 Like

Yup, same here. In my case it’s e.g. “synth_reader.0.weight” and “synth_reader.module.0.weight”, so the replace works like a charm.

Can you explain more clearly how to add a nn.DataParallel temporarily in your network for loading purposes? e.g. can you provide a simple example?
I am new to pytorch, thanks so much!

you can use strict=False in load_state_dict. This can solved the issue.

model.load_state_dict(checkpoint['state_dict'], strict=False)
2 Likes

I would recommend caution in using strict=False here. I tried this, and replacing module. worked correctly and gave me the validation results as expected from my model, while strict=False gave wrong results (even though PyTorch did not complain).

4 Likes