I recently dug out an archived densenet121 weight from 2017.
And now I tried to resurrect the weight using PyTorch 1.4.0
It seems the naming convention had been changed due to some reasons.
When I load the same architecture under newer PyTorch version with legacy state_dict, encountered the following error
RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.densenet121.features.denseblock1.denselayer1.norm1.weight", "module.densenet121.features.denseblock1.denselayer1.norm1.bias", ......
Unexpected key(s) in state_dict: "module.densenet121.features.denseblock1.denselayer1.norm.1.weight", "module.densenet121.features.denseblock1.denselayer1.norm.1.bias", ......
Clearly somehow back in time the keys were in layer.#.param format and probably due to consistency that format was changed to layer#.param
Is there any existing tool to automatically and generally update the old key names to the newer version?
I’m not aware of such a tool, but I think it should be easy to remove the unwanted indexing from all keys in the dict. Are you trying to only remove these index numbers or do you have any other mismatches?
Hello, I have the same problem. I had already trained a model, but now I have changed the names of the layers. After this above description, how do I then equate it to the model’s state_dict?
Thank you in advance for your help!
Hello @ptrblck and @kaltu , I am facing similar problem. I want to change the names of two keys in the model. I tried the suggested method, but I am getting errors. My code snippet and error message are below:
class LeNet(nn.Module):
def __init__(self, num_classes=43, input_channels=3):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(input_channels, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, num_classes)
if 1 == num_classes:
# compatible with nn.BCELoss
self.softmax = nn.Sigmoid()
else:
# compatible with nn.CrossEntropyLoss
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
out = self.softmax(out)
return out
teacher_model = LeNet() # get the model
checkpoint = torch.load('model_best.pth.tar', map_location=device)
state_dict = checkpoint['state_dict']
for key in list(state_dict.keys()):
state_dict[key.replace("conv1.weight","features.0.weight"). replace("conv1.bias", "features.0.bias")] = state_dict.pop(key)
teacher_model.load_state_dict(checkpoint['state_dict'])
It is giving me the following error:
Error(s) in loading state_dict for LeNet:
Missing key(s) in state_dict: "conv1.weight", "conv1.bias".
Unexpected key(s) in state_dict: "features.0.weight", "features.0.bias".
Based on the error message it seems that you are replacing the needed conv1 keys with unexpected features keys. What kind of error were you seeing before trying to manipulate it?
@ptrblck Thank you for the response. Initially, I had no errors and I was able to load the model which has old keys. But, to use this model in the energy calculation framework, it requires the key names as "features.0.weight", "features.0.bias" instead of "conv1.weight”, “conv1.bias” respectively. Therefore, I am trying to modify the key names and then load the model. Please help to solve the problem.
I don’t quite understand the issue, since your manipulation is creating the mismatches.
Did you maybe forget to modify the model before using load_state_dict?
If your state_dict contains parameters stored as features, which should be loaded in the current conv layers, then yes: change the layer names in the model or alternatively the keys of the state_dict.
Based on your currently posted code your model uses the conv layers, the state_dict seems to use the same keys, you are changing them to features, and run into the error.
Please include a complete code on how to load a “checkpoint.pth.tar” file and then use your code to change the key like you mentioned so we can use model.load_state_dict(torch.load(PATH)) for more prediction
Neither pooling layers, nor Flatten or Dropout need any parameters or buffers, so I’m unsure how or why you would like to add buffers to them.
There is “nothing” to store in the state_dict for these layers