Error in state_dict

Johan_pow · December 26, 2018, 6:11pm

I am trying to load a model I have trained, I get the following error and can’t understand how to solve it.

## error: 
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-53-fb3124c9db63> in <module>
     38     return model
     39 
---> 40 model = load_checkpoint('test_MODEL.pt')

<ipython-input-53-fb3124c9db63> in load_checkpoint(checkpoint_path)
     34 
     35 
---> 36     model.load_state_dict(checkpoint['state_dict'])
     37 
     38     return model

/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
    767         if len(error_msgs) > 0:
    768             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 769                                self.__class__.__name__, "\n\t".join(error_msgs)))
    770 
    771     def _named_members(self, get_members_fn, prefix='', recurse=True):

RuntimeError: Error(s) in loading state_dict for ResNet:
	Missing key(s) in state_dict: "fc.weight", "fc.bias", "classifier.fc1.weight", "classifier.fc1.bias", "classifier.fc2.weight", "classifier.fc2.bias". 
	Unexpected key(s) in state_dict: "fc.fc1.weight", "fc.fc1.bias", "fc.fc2.weight", "fc.fc2.bias".

Code to load is the following:

import torch
import json
from torch import nn
import torch.nn.functional as F
from torchvision import models

class RES50Classifier(nn.Module):

    def __init__(self, in_features, hidden_features, 
                       out_features, drop_prob=0.1):
        super().__init__()

        self.fc1 = nn.Linear(in_features, hidden_features)
        self.fc2 = nn.Linear(hidden_features, out_features)
        self.drop = nn.Dropout(p=drop_prob)

    def forward(self, x):
        x = self.drop(F.relu(self.fc1(x)))
        x = self.fc2(x)
        x = F.log_softmax(x, dim=1)
        return x

    
def load_checkpoint(checkpoint_path):
    checkpoint = torch.load(checkpoint_path)
    model = models.resnet50(pretrained=False)

    for param in model.parameters():
        param.requires_grad = False

    # Put the classifier on the pretrained network
    model.classifier = RES50Classifier(2048, 500, 102)
    
    
    
    model.load_state_dict(checkpoint['state_dict'])
    
    return model

model = load_checkpoint('test_MODEL.pt')

The code I use to define the model is the following:

class RES50Classifier(nn.Module):

    def __init__(self, in_features, hidden_features, 
                       out_features, drop_prob=0.1):
        super().__init__()

        self.fc1 = nn.Linear(in_features, hidden_features)
        self.fc2 = nn.Linear(hidden_features, out_features)
        self.drop = nn.Dropout(p=drop_prob)

    def forward(self, x):
        x = self.drop(F.relu(self.fc1(x)))
        x = self.fc2(x)
        x = F.log_softmax(x, dim=1)
        return x

model = models.resnet50(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.fc = RES50Classifier(2048, 500, 102)



criterion = nn.CrossEntropyLoss()

# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.fc.parameters(), lr=0.003)    



model.to(device);

I am running everything on a google VM instance with pytorch 1.0. Thanks in advance

smth · December 27, 2018, 12:28am

It is saying that in the checkpoint you are missing fc.* and classifier.fc1.*. This can happen if you change the source code after you saved your checkpoint.
It looks, from the unexpected keys, that you used to have fc to be a Module containing two nn.Linear layers, i.e fc.fc1 and fc.fc2. Now you have different source code, i.e. fc is a nn.Linear layer and classifier.fc1 is an nn.Linear layer.

Adjust your source code according to your checkpoint.

Johan_pow · December 27, 2018, 2:06pm

Thanks!

The problem was that I used model.classifier = RES50Classifier(2048, 500, 102) changed to model.fc = RES50Classifier(2048, 500, 102) and now it works!