How to change the name of the weights to a new name when saving the model


#1

How I can change the name of the weights in a models when i want to save them?

Here is what i want to do:

I do torch.load to load the pretrained model and update the weights forself.Conv1 (where
self.Conv1 = nn.Conv2d(3,3,kernel_size=1, stride=1, padding=0, bias=False))

after training my model i like to save the weights of self.Conv1 as self.Conv1_V2, because i have the same layer called self.Conv1_V2 in my new model that i like to be initialized with the weight of self.Conv1.

How can I do that? :slight_smile:


#2

I found some way to deal with it but im not sure it that is correct…
after loading my first model i can do
self.Conv1_V2 = self.Conv1

and then when i save this model i will have both self.Conv1_V2 and self.Conv1, so i can use self.Conv1_V2

Does this make sense?


#3

Alternatively you could manipulate the names in your state_dict before saving it.
This would avoid having “zombie” layers in your model, which are not used anymore.


#4

That is a very useful way to manipulate names in the state_dict. However, im not sure how to do it…
can you please explained a little bit more or provide an example on how to do that?

i also have another question.
when i do the following code i will have:

load_name = os.path.join(input_dir, 'trfcn_1_1_52.pth')
checkpoint = torch.load(load_name)
checkpoint['model'].keys()

the output is:

odict_keys([‘RCNN_rpn.RPN_Conv.weight’, ‘RCNN_rpn.RPN_Conv.bias’, ‘RCNN_rpn.RPN_cls_score.weight’, ‘RCNN_rpn.RPN_cls_score.bias’, ‘RCNN_rpn.RPN_bbox_pred.weight’, ‘RCNN_rpn.RPN_bbox_pred.bias’, ‘InsideNet.Conv1.weight’, ‘InsideNet.Conv3.weight’, ‘InsideNet.Conv4.weight’, ‘RCNN_bbox_base.weight’, ‘RCNN_cls_base.weight’])

is there a way that i just keep the

‘InsideNet.Conv1.weight’, ‘InsideNet.Conv3.weight’, ‘InsideNet.Conv4.weight’

and delete the rest???

Thanks a lot


#5

You code should rename the self.conv1 layers:

class ModelA(nn.Module):
    def __init__(self):
        super(ModelA, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3, 1, 1)
        self.fc1 = nn.Linear(6*4*4, 2)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.fc1(x)
        return x


class ModelB(nn.Module):
    def __init__(self):
        super(ModelB, self).__init__()
        self.conv1_v2 = nn.Conv2d(1, 6, 3, 1, 1)
        self.fc1 = nn.Linear(6*4*4, 2)
        
    def forward(self, x):
        x = F.relu(self.conv1_v2(x))
        x = self.fc1(x)
        return x

modelA = ModelA()
modelB = ModelB()
state_dict = modelA.state_dict()
state_dict_v2 = copy.deepcopy(state_dict)
for key in state_dict:
    if 'conv1' in key:
        pre, post = key.split('.')
        state_dict_v2[pre+'_v2'+'.'+post] = state_dict_v2.pop(key)

modelB.load_state_dict(state_dict_v2)
(modelB.conv1_v2.weight == modelA.conv1.weight).all()

If you want to filter out some parameters, you could adapt this dict comprehension with your condition:

state_dict_filt = {k: v for k, v in state_dict.items() if 'fc1' in k}

#6

hmmmm im a little confuse :confused:
I mean when you rename conv1 to conv1_v2 why should load_state_dict(state_dict_v2) goes and update the weights of conv1 with conv1_v2, they have the same name …
Also im using strict=False (bc my two model are not exactly the same) in my load_state_dict if that matters


#7

I thought that was your use case, i.e. saving a model with certain layer names and load these parameters in another model using other layer names.
Would that work for you? Alternatively, one could probably manipulate the layers using getattr and setattr, but I’m not a huge fan of it, since it might break other stuff (e.g. hooks), although I’m not sure about it.


#8

Im sorry to coming back to this question
Here is what im trying to do in a very simple example:
I have this simple model:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3, 1, 1)
        self.fc1 = nn.Linear(6*4*4, 2)
        
    def forward(self, x):
        x = F.relu(self.conv1_v2(x))
        x = self.fc1(x)
        return x
    
model = Net()
print(model)

if i print it i will have

Net(
(conv1): Conv2d (1, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc1): Linear(in_features=96, out_features=2)
)

I want to make it to be

Net(
(conv1_V2): Conv2d (1, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc1): Linear(in_features=96, out_features=2)
)

and i want to do it before i start training it, so later when i save it i will no longer have conv1 and instead, i will have conv1_V2.
and after saving and reloading this model, this conv1_V2 is going to be use in another network later. is it a way to do it?

so far i was using the zombie way to do it but i notice when i do that the conv1_V2 will not show up in the model.named_parameters() :ok_man::man_shrugging:t4:

P.S. by conv1_V2 not showing up in model.named_parameters() i mean:


import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3, 1, 1)
        self.fc1 = nn.Linear(6*4*4, 2)
        self.conv1_v2 = self.conv1

        
    def forward(self, x):
        x = F.relu(self.conv1_v2(x))
        x = self.fc1(x)
        return x
    
model = Net()
print(model)
params=[]
for key, value in dict(model.named_parameters()).items():
    if value.requires_grad:
        print(key)

Net(
(conv1): Conv2d (1, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc1): Linear(in_features=96, out_features=2)
(conv1_v2): Conv2d (1, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
conv1.weight
conv1.bias
fc1.weight
fc1.bias


#9

conv1_v2 won’t show up in the parameters, as it’s basically just pointing to conv1.
The underlying layer will be trained, since conv1 is in the parameters.
If you compare the gradients of conv1 and conv1_v2, you’ll see that they are identical.

Would that work for your use case?
If not, could you explain your use case a bit?


#10

Yes, that actually works.
your explanation made it more clear to understand what is going on,
Thanks