[solved] KeyError: 'unexpected key "module.encoder.embedding.weight" in state_dict'

(Wasi Ahmad) #1

I am getting the following error while trying to load a saved model.

KeyError: 'unexpected key "module.encoder.embedding.weight" in state_dict'

This is the function I am using to load a saved model. (as suggested in http://pytorch.org/docs/notes/serialization.html#recommend-saving-models)

def load_model_states(model, tag):
    """Load a previously saved model states."""
    filename = os.path.join(args.save_path, tag)
    with open(filename, 'rb') as f:

The model is a sequence-to-sequence network whose init function (constructor) is given below.

def __init__(self, dictionary, embedding_index, max_sent_length, args):
    """"Constructor of the class."""
    super(Sequence2Sequence, self).__init__()
    self.dictionary = dictionary
    self.embedding_index = embedding_index
    self.config = args
    self.encoder = Encoder(len(self.dictionary), self.config)
    self.decoder = AttentionDecoder(len(self.dictionary), max_sent_length, self.config)
    self.criterion = nn.NLLLoss()  # Negative log-likelihood loss

    # Initializing the weight parameters for the embedding layer in the encoder.
    self.encoder.init_embedding_weights(self.dictionary, self.embedding_index, self.config.emsize)

When I print the model (sequence-to-sequence network), I get the following.

Sequence2Sequence (
  (encoder): Encoder (
    (drop): Dropout (p = 0.25)
    (embedding): Embedding(43723, 300)
    (rnn): LSTM(300, 300, batch_first=True, dropout=0.25)
  (decoder): AttentionDecoder (
    (embedding): Embedding(43723, 300)
    (attn): Linear (600 -> 12)
    (attn_combine): Linear (600 -> 300)
    (drop): Dropout (p = 0.25)
    (out): Linear (300 -> 43723)
    (rnn): LSTM(300, 300, batch_first=True, dropout=0.25)
  (criterion): NLLLoss (

So, module.encoder.embedding is an embedding layer, and module.encoder.embedding.weight represents the associated weight matrix. So, why it says- unexpected key "module.encoder.embedding.weight" in state_dict?

'unexpected key "module.module.module.conv1.weight" in state_dict'
Usage of nn.DataParallel
How to store model which trained by multi GPUs(DataParallel)?
Loading weights for CPU model while trained on GPU
Missing keys & unexpected keys in state_dict when loading self trained model
Problem loading saved state_dict
Problem loading pretrained weights - missing keys
(Francisco Massa) #2

You probably saved the model using nn.DataParallel, which stores the model in module, and now you are trying to load it without DataParallel. You can either add a nn.DataParallel temporarily in your network for loading purposes, or you can load the weights file, create a new ordered dict without the module prefix, and load it back.

'unexpected key "module.module.module.conv1.weight" in state_dict'
(Wasi Ahmad) #3

Yes, I used nn.DataParallel. I didn’t understand your second suggestion. Loading the weights file, create a new ordered dict without the module prefix and load it back. (Can you provide an example?)

Are you suggesting something like this? (example taken from here - https://github.com/OpenNMT/OpenNMT-py/blob/master/train.py)

model_state_dict = model.module.state_dict() if len(opt.gpus) > 1 else model.state_dict()
model_state_dict = {k: v for k, v in model_state_dict.items() if 'generator' not in k}
generator_state_dict = model.generator.module.state_dict() if len(opt.gpus) > 1 else model.generator.state_dict()
#  (4) drop a checkpoint
checkpoint = {
	'model': model_state_dict,
	'generator': generator_state_dict,
	'dicts': dataset['dicts'],
	'opt': opt,
	'epoch': epoch,
	'optim': optim
		   '%s_acc_%.2f_ppl_%.2f_e%d.pt' % (opt.save_model, 100*valid_acc, valid_ppl, epoch))  

May I ask you one question about the above code snippet, what is generator here?

Error: 'DataParallel' object has no attribute 'items'
(Francisco Massa) #4

I was thinking about something like the following:

# original saved file with DataParallel
state_dict = torch.load('myfile.pth.tar')
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
    name = k[7:] # remove `module.`
    new_state_dict[name] = v
# load params

IndexError: When performing advanced indexing the indexing objects must be LongTensors or convertible to LongTensors
Error when loading pre-trained model
(Wasi Ahmad) #5

First of all, thanks a lot, by adding a nn.DataParallel temporarily in my network for loading purposes worked. Even I tried your second suggested approach, it worked for me as well. Thanks a lottt :slight_smile:


@wasiahmad By adding nn.DataParallel temporarily into your network did you have to have the same number of GPUs available to load the model as when you saved the model?

(Tingfan Wu) #8

A related question, given the fact we see that saving DataParallel wrapped model can cause problems when the model_state_dict is loaded into an unwrapped model. Would one recommend to save the “unwrapped” ‘module’ field inside a DataParallel instance instead ?

(Eugenio Culurciello) #9

here is our way for alexnet trained with pytorch examples imagenet:

(Daniel Du) #11

This works for me. Thanks a lot !

(dat duong) #12

I am having the same problem, and using the trick with OrderedDict does not work. I am using pytorch 0.3 in the case anything has changed.

I have an word embedding layer that was trained along with the classification task. Training was successful, but loading the model gave the error

Traceback (most recent call last): File "source/test.py", line 72, in <module> helper.load_model_states_from_checkpoint(model, args.save_path + 'model_best.pth.tar', 'state_dict', args.cuda) File "/u/flashscratch/flashscratch1/d/datduong/universalSentenceEncoder/source/helper.py", line 55, in load_model_states_from_checkpoint model.load_state_dict(checkpoint[tag]) File "/u/home/d/datduong/project/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 490, in load_state_dict .format(name)) KeyError: 'unexpected key "embedding.embedding.embedding.weight" in state_dict'

The key embedding.embedding.embedding.weight exists (see image). Please let me know what to do.

(Oktai Tatanov) #13

In my opinion, this question-answer should be in something FAQ :slight_smile:

(Wyn Mew) #14

Check out your saved model file:

check_point = torch.load('myfile.pth.tar')

You may find out your ‘check_point’ got several keys such as ‘state_dict’ etc.

checkpoint = torch.load(resume)
state_dict =checkpoint['state_dict']

from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
    name = k[7:] # remove 'module.' of dataparallel


(dashesy) #15

What about nn.DistributedDataParallel, it seems DistributedDataParallel and DataParallel can load each other’s parameters.
Is there an official way to save/load among DDP/DP/None?

(17764591637) #16

just do this:

model = torch.load(train_model)


it works for me!

Loading a pre-trained model

Thanks a lot, this worked for me.

(Alon) #19

Instead of deleting the “module.” string from all the state_dict keys, you can save your model with:
torch.save(model.module.state_dict(), path_to_file)
instead of
torch.save(model.state_dict(), path_to_file)
that way you don’t get the “module.” string to begin with…

(Stoneyang) #20

Thanks for your hints! It saved my time:rose:

(Amy Hu) #21

that’s work simple and perfect for me! thanks

(Kaiyang) #22

In case someone needs, this function can handle loading weights w/ and w/o ‘module’.

To save model without ‘module’, you may try this.

(Lucas Teixeira) #23

After pytorch 1.xx
this was fixed, now you only need to do this

            if isinstance(args.pretrained, torch.nn.DataParallel):
                args.pretrained = args.pretrained.module