Modifying the model checkpoint

Let’s say I have trained a resnet-50 model and i have replaced its last fc layer with a custom defined layer which performs l2 normalisation on the output of the avgpooling layer, making the last few layers look like this:
-----previous layers------
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Sequential(
(0): L2Normalization()
)
)

i have saved the model as a checkpoint with the model’s state dictionary saved in it, now let’s say i want to add a few layers to perform feature reduction and i throw in batchnorm layers so that my model looks like something below:
-----previous layers------
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Sequential(
(0): Linear(in_features=2048, out_features=1024, bias=True)
(1): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): Linear(in_features=1024, out_features=256, bias=True)
(3): L2Normalization()
)
)

now i want to train this model on the same data, so it would be beneficial for me to load the weights from the previous model (for layers other than the ones i added later), but i wont be able to do so as there would be a layer mismatch.
is there any way i could modify the checkpoint to have some random values for the extra layers that are not present in the checkpoint or is there anything else that i could do to load the values from the layers that are present in the model and randomly initialise values for the newly added layers ?
i am new to pytorch and deep learning so i feel kinda lost here, feel free to let me know if i am doing anything incorrectly.

The simplest approach might be to load the state_dict into the model before adding the new layers. Alternatively, you could also add new key-value pairs into the state_dict as it’s derived from a dict.

1 Like