How to load part of pre trained model?

Saharkakavand · January 6, 2020, 5:02pm

I loaded vgg16 with pretrained model and pruned some filters of each conv2d layers, they are now in an orderdict(layer_weight) how can I save it in pth format?

 with open('/home/sahar/sqenv/faster-rcnn.pytorch/Untitled/first_method.txt', 'r') as g:
        rankinglist = g.readlines()
        for i in rankinglist[:100]:
            layer = i.split(',')[0]
            index = i.split(',')[1]
            layer_f_index.setdefault(layer, []).append(index) # ranking list
    layer_weight = OrderedDict()
    layer_bias = OrderedDict()
    for layer in (0, 2, 5, 7, 10, 12, 14, 17, 19, 21, 24, 26, 28):
        _, conv = list(model._modules.items())[layer]
        old_weights = conv.weight.data.cpu().numpy()
        layer_weight[layer] = old_weights
        old_biases = conv.bias.data.cpu().numpy()
        layer_bias[layer] = old_biases


    def find_adjacents(value, list_layer):
        ind = list_layer.index((value))
        #print(ind)
        #print(list_layer[ind + 1])
        return list_layer[ind + 1]

    for key in layer_f_index:
        all_idx=np.arange(layer_weight[int(key)].shape[0])
        id = []
        for x in all_idx:
            if str(x) not in layer_f_index[key]:
                new_id = x
                id.append(x)

        layer_weight[int(key)]=layer_weight[int(key)][id, :, :, :]
        layer_bias[int(key)]=layer_bias[int(key)][id]
        if int(key) < 28:
            l = [0, 2, 5, 7, 10, 12, 14, 17, 19, 21, 24, 26, 28]
            next_input = find_adjacents(int(key), l)
            layer_weight[int(next_input)] = layer_weight[int(next_input)][:, id, :, :]```

ptrblck · January 6, 2020, 9:21pm

You could use torch.save directly to store an OrderedDict:

d = OrderedDict([
    ('a', torch.tensor(1)), ('b', torch.tensor(2))
])
torch.save(d, 'tmp'pth')

Daniel_Kurniadi · January 8, 2020, 1:12pm

Genius. This is a useful solution for frequent cases, e.g. when you want to have different last fully connected in your model. Say because you want to train on different number of classes. Thanks!

Saharkakavand · January 8, 2020, 5:41pm

thanks for reply
I trained the vgg and saved the model as pth file. then I load it for pruning some filters of it.
the last conv after pruning is not 512 anymore, some filters are gone.
how Pruning the last conv layer affects the first linear layer of the classifier which is (512 *7 *7, 4096).
how can I prune the input weights of classifier according to the last conv layer.

Yummy_Chen · February 19, 2020, 4:19am

however, If I just use
model.fc = nn.Linear(in, out) # just change the out_dim，the key "fc" is not changed,
then both the “pretrained_dict” and “model_dict” have key “fc”,
whether the “filter processing” is worked to update state dict?

joohyunglee · March 18, 2020, 8:31am

Could you tell me how are they different?

Jimmy_Xiaoke_Shen · May 8, 2020, 11:25pm

This is a beautiful solution. I veryfied based on your idea and it works pretty well.

import torch
import torch.nn as nn
import torch.nn.functional as F
class  ModelA(torch.nn.Module):
    def __init__(self):
        super(ModelA,self).__init__()
        self.A = torch.nn.Linear(2, 3)
        self.B = torch.nn.Linear(3, 4)
        self.C = torch.nn.Linear(4, 4)
        self.D = torch.nn.Linear(4, 3)
    def forward(self, x):
        x = F.relu(self.A(x))
        x = F.relu(self.B(x))
        x = F.relu(self.C(x))
        x = F.relu(self.D(x))
        return x
class  ModelB(torch.nn.Module):
    def __init__(self):
        super(ModelB,self).__init__()
        self.A = torch.nn.Linear(2, 3)
        self.B = torch.nn.Linear(3, 4)
        self.C = torch.nn.Linear(4, 4)
        self.E = torch.nn.Linear(4, 2)
    def forward(self, x):
        x = F.relu(self.A(x))
        x = F.relu(self.B(x))
        x = F.relu(self.C(x))
        x = F.relu(self.E(x))
        return x
modelA = ModelA()
modelA_dict = modelA.state_dict()
print('-'*40)
for key in sorted(modelA_dict.keys()):
    parameter = modelA_dict[key]
    print(key)
    print(parameter.size())
    print(parameter)
modelB = ModelB()
modelB_dict = modelB.state_dict()
print('-'*40)
for key in sorted(modelB_dict.keys()):
    parameter = modelB_dict[key]
    print(key)
    print(parameter.size())
    print(parameter)
print('-'*40)
print("modelB is going to use the ABC layers parameters from modelA")
pretrained_dict = modelA_dict
model_dict = modelB_dict
# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict)
# 3. load the new state dict
modelB.load_state_dict(model_dict)
modelB_dict = modelB.state_dict()
for key in sorted(modelB_dict.keys()):
    parameter = modelB_dict[key]
    print(key)
    print(parameter.size())
    print(parameter)

Output

----------------------------------------
A.bias
torch.Size([3])
tensor([ 0.4012, -0.3587,  0.6650])
A.weight
torch.Size([3, 2])
tensor([[ 0.5574,  0.4757],
        [-0.3795, -0.4850],
        [ 0.2248, -0.3578]])
B.bias
torch.Size([4])
tensor([ 0.1353, -0.3448,  0.4272, -0.1463])
B.weight
torch.Size([4, 3])
tensor([[-0.4960,  0.2930,  0.1822],
        [-0.4309, -0.4259, -0.3604],
        [ 0.2976,  0.2279, -0.3805],
        [-0.2423, -0.2915,  0.5130]])
C.bias
torch.Size([4])
tensor([-0.2964, -0.3516, -0.2900,  0.2390])
C.weight
torch.Size([4, 4])
tensor([[ 0.0877,  0.4150, -0.1938,  0.3659],
        [-0.3505,  0.1734, -0.1803,  0.2914],
        [ 0.3375, -0.2661,  0.4651,  0.0041],
        [-0.1866,  0.0055,  0.0230,  0.0502]])
D.bias
torch.Size([3])
tensor([0.2733, 0.3856, 0.2848])
D.weight
torch.Size([3, 4])
tensor([[ 0.4498,  0.4846, -0.2461,  0.1043],
        [-0.1462, -0.1684,  0.0155, -0.2861],
        [-0.2750,  0.3607,  0.4295, -0.3481]])
----------------------------------------
A.bias
torch.Size([3])
tensor([-0.2486, -0.3553, -0.3503])
A.weight
torch.Size([3, 2])
tensor([[ 0.1880, -0.6102],
        [-0.1288,  0.6273],
        [ 0.1040, -0.5014]])
B.bias
torch.Size([4])
tensor([ 0.2349,  0.1911, -0.5200, -0.1111])
B.weight
torch.Size([4, 3])
tensor([[ 0.3223,  0.4178, -0.1244],
        [-0.2392,  0.5335, -0.4440],
        [-0.4544,  0.3134,  0.1886],
        [-0.3317,  0.2892, -0.5672]])
C.bias
torch.Size([4])
tensor([ 0.4484,  0.3125, -0.1636, -0.1316])
C.weight
torch.Size([4, 4])
tensor([[-0.1965, -0.3447, -0.4057, -0.2020],
        [-0.3002,  0.0170, -0.0360,  0.2502],
        [ 0.3630, -0.2502,  0.2334, -0.1819],
        [ 0.1432,  0.1483, -0.2965, -0.0004]])
E.bias
torch.Size([2])
tensor([-0.1594,  0.4471])
E.weight
torch.Size([2, 4])
tensor([[ 0.0461, -0.3409,  0.3723, -0.1613],
        [-0.0548,  0.3238, -0.2238,  0.1237]])
----------------------------------------
modelB is going to use the ABC layers parameters from modelA
A.bias
torch.Size([3])
tensor([ 0.4012, -0.3587,  0.6650])
A.weight
torch.Size([3, 2])
tensor([[ 0.5574,  0.4757],
        [-0.3795, -0.4850],
        [ 0.2248, -0.3578]])
B.bias
torch.Size([4])
tensor([ 0.1353, -0.3448,  0.4272, -0.1463])
B.weight
torch.Size([4, 3])
tensor([[-0.4960,  0.2930,  0.1822],
        [-0.4309, -0.4259, -0.3604],
        [ 0.2976,  0.2279, -0.3805],
        [-0.2423, -0.2915,  0.5130]])
C.bias
torch.Size([4])
tensor([-0.2964, -0.3516, -0.2900,  0.2390])
C.weight
torch.Size([4, 4])
tensor([[ 0.0877,  0.4150, -0.1938,  0.3659],
        [-0.3505,  0.1734, -0.1803,  0.2914],
        [ 0.3375, -0.2661,  0.4651,  0.0041],
        [-0.1866,  0.0055,  0.0230,  0.0502]])
E.bias
torch.Size([2])
tensor([-0.1594,  0.4471])
E.weight
torch.Size([2, 4])
tensor([[ 0.0461, -0.3409,  0.3723, -0.1613],
        [-0.0548,  0.3238, -0.2238,  0.1237]])

sudonto · May 25, 2020, 9:20am

I wonder if your code can be replaced by simpy setting setting parameter strict to False, like this:

pretrain_dict = torch.load(pretrain_se_path)
#Filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
model.load_state_dict(pretrained_dict, strict=False)

Would it be the same? @ptrblck @zeakey

ptrblck · May 26, 2020, 1:53am

Using strict=False should work and would drop all additional or missing keys.
However, often the explicit filtering might be clearer for debugging purposes, so I don’t have a strong preference towards one approach.

DongDong_Chen · March 6, 2021, 3:57am

I faced the problem if I also want to load the Optimizer while the size is inconsistent. Can Anyone help me? Thank you so much!

FreddyGump · April 23, 2021, 6:55pm

This one is great! Thanks so much for the clean solution!

Jaideep_Valani · June 9, 2021, 6:10pm

current_model=net.state_dict()
keys_vin=torch.load('',map_location=device)

new_state_dict={k:v if v.size()==current_model[k].size()  else  current_model[k] for k,v in zip(current_model.keys(), keys_vin['model_state_dict'].values()) 
                 }

This small piece of code should also work if state dicts differ by handful of layers

doem97 · June 16, 2021, 8:02am

Great, much thanks! This should be the most useful and simple one. People should give more heart to let others note this one. Thanks!

Anthony_Dave · October 24, 2021, 9:14am

Bravo, bro! You save my time!!!

Aryan_Garg · June 13, 2023, 8:38am

what is this behavior @bgfmac

Erotemic · January 12, 2024, 12:28am

I have a solution for this that works in most cases. It finds a correspondence between two sets of weights by finding a maximum subgraph isomorphism or embedding (depeding on the configuration).

I’ve implemented it in torch-liberator: torch-liberator · PyPI

For more details see the Pytorch Hackathon 2021 prize page, where I won 3rd place with the algorithm and implementation: TorchLiberator - Partial Weight Loading | Devpost

I have a 3 minute youtube-video describing it: https://www.youtube.com/watch?v=GQqtn61iNsc