Changing the architecture of a model & using old pretrained weights

I have a pytorch model and it’s weights in a .pth format. I would like to remove two layers of the model, both from the architecture and the parameters so I can continue using the pretrained weights. I’ve changed the architecture and deleted the weights and biases of those two from the state dictionary, however upon trying to load it into the new architecture I get an error about the parameter group not being the same.

    Traceback (most recent call last):
  File "main.py", line 318, in <module>
    main()
  File "main.py", line 165, in main
    net.load_state_dict(new_state_dict)#, strict=False)
  File ".../python3.7/site-packages/torch/optim/optimizer.py", line 171, in load_state_dict
    raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

This is what I did to the state dictionary to get rid of the parameters.


del state_dict['classifier.0.weight']
del state_dict['classifier.0.bias']
del state_dict['classifier.1.weight']
del state_dict['classifier.1.bias']
del state_dict['classifier.1.running_mean']
del state_dict['classifier.1.running_var']
del state_dict['classifier.1.num_batches_tracked']
del state_dict['classifier.2.act_quant.fused_activation_quant_proxy.tensor_quant.scaling_impl.value']
del state_dict['classifier.4.weight']
del state_dict['classifier.4.bias']
del state_dict['classifier.5.weight']
del state_dict['classifier.5.bias']
del state_dict['classifier.5.running_mean']
del state_dict['classifier.5.running_var']
del state_dict['classifier.5.num_batches_tracked']
del state_dict['classifier.6.act_quant.fused_activation_quant_proxy.tensor_quant.scaling_impl.value']

However, the optimizer state dict has different, ambiguous key names that I cannot tell which ones need to removed.

optimizer_dict['state'].keys()
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126])

How can I fix this or go about it in a better way?

It seems you are using a framework that is also saving the optimizer’s state dict.
The issue is not about model parameters, but optimizer group’s parameters. So basically for each layer in the net you have some optimizer-related paramters that you should remove as well. Or just modify your saving/loading funcs not to track this.

If you use another one like SGD you won’t have this problem.