How to load part of pre trained model?

Therefore, does the names like (conv_r1) is the key, and corresponds to the names when I create variables, right? These are the keys?

Yes.

# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict) 
# 3. load the new state dict
model.load_state_dict(model_dict)

Dear smth, different with the above code in this forum, I tried to use the following one:

net.state_dict().update(pretrain_dict)

But it seems paradoxical to me, since net.state_dict has already updated, why should I use the following step:
net.load_state_dict(net.state_dict())
In another word, why I still need to use load_state_dict() after net.state_dict().update(pretrain_dict)

PS.
Sorry, I later realized that

net.state_dict().update(pretrain_dict)

is wrong, but I do not know why

1 Like

Thanks, it’s really helpful.

BTW,If the key are same, but size are different, this argument can’t handle it. We need to remove the variables with different size in pretrained model first, then load the dict with strict = False.

I have same problem but I did not solve

Edit:
just realised that the problem is solved by strict=False in model.load_state_dict(state_dict, strict=False)

Edit2:
Don’t know what I did, but model.load_state_dict(state_dict, strict=False) does not work after all if the shapes does not work. So my previous edit is invalid. The solution below should work.

In order to check for the size as well, I think one can just substitute

# filter unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}

for

# filter unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if
                       (k in model_dict) and (model_dict[k].shape == pretrained_dict[k].shape)}
13 Likes

I think dict.update() and model.load_state_dict() are two completely different process.

thank you so much ilhubhlubhyluh

Hi, can I load pretrained weights to my new model which has one more layer than the pretrained one?
I applied the code, seems it has no problem to load the model but some error happen during training saying the dimension of the weights does not match the input.

Does anyone have idea how can I solve it please? (Btw the one more layer I added is just a batch norm layer)
Thanks!

As long as your layers have the same names in both models you can load the weights from the previously trained model using the strict=False argument like so: modelB.load_state_dict(torch.load(PATH), strict=False)
See here for detailed info.

4 Likes

To load a smaller model into a bigger model(whose .pth is available of course) and whose layers correspond (like, making some modifications to a model, maybe adding some layers and stuff), this can be done : (pretrained_dict is the state dictionary of the pre-trained model available)

pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict} (or just load it by torch.load)
bigger_model = bigger_model()

for name in bigger_model.state_dict().keys() :
if name in pretrained_dict.keys() :
bigger_model.state_dict()[name].copy_(pretrained_dict[name])

To load only specific layers, the pretrained_dict can be modified accordingly too.

A clear answer:

# load part of the pre trained model
# save
torch.save(pre_model.state_dict(), PATH)

# load
pretrained_dict = torch.load(PATH)
model = TheModelClass(*args, **kwargs)
model_dict = model.state_dict()
# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict) 
# 3. load the new state dict
model.load_state_dict(model_dict)
8 Likes

Could someone please help me load the first 4 layers from a pre-existing model? I have the same thing that @yufeng posted above, but I’m having trouble with the path in the second line pretrained_dict = torch.load(PATH). What are the requirements for the path? Does it have to be a certain type of file? I have an ipynb that I want to use as the pre-existing model but when I put the file name in torch.load(path) I get an error that the file or directory does not exist. Can someone help me fix this?
Thank you so much!!!

Now this is supposed by ml-logger.

To manipulate the prefix of a checkpoint file you can do

from ml_logger import logger

net = models.ResNet()
logger.load_module(
       net, 
       path="/checkpoint/geyang/resnet.pkl",
       matcher=lambda d, k, p: d[k.replace('embed.')])

To fill-in if there are missing keys:

from ml_logger import logger

net = models.ResNet()
logger.load_module(
       net, 
       path="/checkpoint/geyang/resnet.pkl",
       matcher=lambda d, k, p: d[k] if k in d else p[k])

For detailed doc see: https://ml-logger.readthedocs.io/en/latest/modindex.html?highlight=load%20module#ml_logger.ML_Logger.load_module

I agree with you. You are right.

a detailed example on this

I find a bug in the state_dict.update(). if the state_dict type is torch.cuda.float and the update method will transform it to torch.float. maybe pytorch keep the type the same after update

I am trying to load vgg19 network with modified number of input channels. The number of input channels is 4 is my case and also i am changing the classifier to my own classifier. I have also removed the Adaptive Average pooling layer from the network. How should i be loading the pre-trained weights into the modified version of my model?

I think the easiest way would be to load the weights to the unmodified model first, and then modify your model as you wish.

1 Like

I loaded vgg16 with pretrained model and pruned some filters of each conv2d layers, they are now in an orderdict(layer_weight) how can I save it in pth format?

 with open('/home/sahar/sqenv/faster-rcnn.pytorch/Untitled/first_method.txt', 'r') as g:
        rankinglist = g.readlines()
        for i in rankinglist[:100]:
            layer = i.split(',')[0]
            index = i.split(',')[1]
            layer_f_index.setdefault(layer, []).append(index) # ranking list
    layer_weight = OrderedDict()
    layer_bias = OrderedDict()
    for layer in (0, 2, 5, 7, 10, 12, 14, 17, 19, 21, 24, 26, 28):
        _, conv = list(model._modules.items())[layer]
        old_weights = conv.weight.data.cpu().numpy()
        layer_weight[layer] = old_weights
        old_biases = conv.bias.data.cpu().numpy()
        layer_bias[layer] = old_biases


    def find_adjacents(value, list_layer):
        ind = list_layer.index((value))
        #print(ind)
        #print(list_layer[ind + 1])
        return list_layer[ind + 1]

    for key in layer_f_index:
        all_idx=np.arange(layer_weight[int(key)].shape[0])
        id = []
        for x in all_idx:
            if str(x) not in layer_f_index[key]:
                new_id = x
                id.append(x)

        layer_weight[int(key)]=layer_weight[int(key)][id, :, :, :]
        layer_bias[int(key)]=layer_bias[int(key)][id]
        if int(key) < 28:
            l = [0, 2, 5, 7, 10, 12, 14, 17, 19, 21, 24, 26, 28]
            next_input = find_adjacents(int(key), l)
            layer_weight[int(next_input)] = layer_weight[int(next_input)][:, id, :, :]```

You could use torch.save directly to store an OrderedDict:

d = OrderedDict([
    ('a', torch.tensor(1)), ('b', torch.tensor(2))
])
torch.save(d, 'tmp'pth')
2 Likes