Problem loading model from state dict

kirk86 · January 16, 2018, 4:18pm

Hi everyone I am getting an error wen trying to load the model from the saved checkpoint file.

Model:

class RecNet2(nn.Module):
    def __init__(self, in_shape, num_classes=12 ):
        super(RecNet2, self).__init__()

        self.layer1 = nn.GRU(in_shape[-1], 256)
        self.layer2 = nn.LSTM(256, 512)
        self.fc1 = nn.Linear(512, 256)
        self.fc2 = nn.Linear(40 * 256, num_classes)

    def forward(self, x):
        x, h_out = self.layer1(x)
        x = F.dropout(x, p=0.5)
        x, h_ou2 = self.layer2(x)
        x = F.dropout(x, p=0.3)
        x = self.fc1(x)
        x = self.fc2(x.view(-1, 40 * 256))

        return x  #logits

Error message:

>>> model.load_state_dict(fle)
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 482, in load_state_dict
    own_state[name].copy_(param)
RuntimeError: invalid argument 2: sizes do not match at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/generic/THCTensorCopy.c:101

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/miniconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 487, in load_state_dict
    .format(name, own_state[name].size(), param.size()))
RuntimeError: While copying the parameter named layer1.weight_ih_l0, whose dimensions in the model are torch.Size([768]) and whose dimensions in the checkpoint are torch.Size([768, 101]).

The .pth file has been created during training of the above model. Any suggestions on how to investigate this and why are there mismatch tensor shape errors?

smth · January 18, 2018, 3:42am

that says that layer of your model has been changed from the checkpoint to the definition. In your case it is nn.GRU

kirk86 · January 30, 2018, 2:15pm

Thanks Soumith. Appreciate it!

Dinesh_Shrestha · October 2, 2019, 4:13pm

RuntimeError Traceback (most recent call last)
in ()
1 model = CaptionModel_B(2048, 50, 160, vocab_size, num_layers=1)
----> 2 model.load_state_dict(torch.load(‘im_caption_35.727_0.316_epoch_20.pth.tar’, map_location=‘cpu’))
3 solver = NetSolver(data, model)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
843 if len(error_msgs) > 0:
844 raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
–> 845 self.class.name, “\n\t”.join(error_msgs)))
846 return _IncompatibleKeys(missing_keys, unexpected_keys)
847

RuntimeError: Error(s) in loading state_dict for CaptionModel_B:
size mismatch for rnn.embed.weight: copying a param with shape torch.Size([9080, 50]) from checkpoint, the shape in current model is torch.Size([8947, 50]).
size mismatch for rnn.linear.weight: copying a param with shape torch.Size([9080, 160]) from checkpoint, the shape in current model is torch.Size([8947, 160]).
size mismatch for rnn.linear.bias: copying a param with shape torch.Size([9080]) from checkpoint, the shape in current model is torch.Size([8947]).

Can you help me to solve this error?