Loading a few layers from a pretrained MDNet

Hi,

I’m using MDNet, which is a network with 3 conv layers,2 fc layers, plus branches in offline mode or fc layer in online mode (not pretrained, this layer is learned in the offline/online learning).
Thing is, I changed fc5 to add another feature thus I want to load only the weights from the 3 conv layers (conv1-conv3) and the first fc layer (fc4) in online learning. I saw the two solutions mentioned in How to load part of pre trained model?
but still not sure how to change load_model function so the net will load only conv1, conv2, conv3, and fc4 from the .pth file no need to load the .mat file).
Any idea what exactly should I do?

Plus, what exactly this line mean (‘shared_weights’ is not defined anywhere)?
shared_layers = states[‘shared_layers’]

Thanks!

That’s the MDNet class (the forward also includes the new feature, it’s just not shown here, but it’s irrelevant) :

class MDNet(nn.Module):
def init(self, model_path=None, K=1):
super(MDNet, self).init()
self.K = K
self.layers = nn.Sequential(OrderedDict([
(‘conv1’, nn.Sequential(nn.Conv2d(3, 96, kernel_size=7, stride=2),
nn.ReLU(),
LRN(),
nn.MaxPool2d(kernel_size=3, stride=2))),
(‘conv2’, nn.Sequential(nn.Conv2d(96, 256, kernel_size=5, stride=2),
nn.ReLU(),
LRN(),
nn.MaxPool2d(kernel_size=3, stride=2))),
(‘conv3’, nn.Sequential(nn.Conv2d(256, 512, kernel_size=3, stride=1),
nn.ReLU())),
(‘fc4’, nn.Sequential(nn.Dropout(0.5),
nn.Linear(512 * 3 * 3, 512),
nn.ReLU())),
(‘fc5’, nn.Sequential(nn.Dropout(0.5),
nn.Linear(512, 512),
nn.ReLU()))]))

    self.branches = nn.ModuleList([nn.Sequential(nn.Dropout(0.5), 
                                                 nn.Linear(512, 2)) for _ in range(K)])
    
    if model_path is not None:
        if os.path.splitext(model_path)[1] == '.pth':
            self.load_model(model_path)
        elif os.path.splitext(model_path)[1] == '.mat':
            self.load_mat_model(model_path)
        else:
            raise RuntimeError("Unkown model format: %s" % (model_path))
    self.build_param_dict()

def build_param_dict(self):
    self.params = OrderedDict()
    for name, module in self.layers.named_children():
        append_params(self.params, module, name)
    for k, module in enumerate(self.branches):
        append_params(self.params, module, 'fc6_%d'%(k))

def set_learnable_params(self, layers):
    for k, p in self.params.items():
        if any([k.startswith(l) for l in layers]):
            p.requires_grad = True
        else:
            p.requires_grad = False

def get_learnable_params(self):
    params = OrderedDict()
    for k, p in self.params.items():
        if p.requires_grad:
            params[k] = p
    return params

def forward(self, x, k=0, in_layer='conv1', out_layer='fc6'):
    #
    # forward model from in_layer to out_layer

    run = False
    for name, module in self.layers.named_children():
        if name == in_layer:
            run = True
        if run:
            x = module(x)
            if name == 'conv3':
                x = x.view(x.size(0),-1)
            if name == out_layer:
                return x
    
    x = self.branches[k](x)
    if out_layer=='fc6':
        return x
    elif out_layer=='fc6_softmax':
        return F.softmax(x)

def load_model(self, model_path):
    states = torch.load(model_path)
    shared_layers = states['shared_layers']
    self.layers.load_state_dict(shared_layers)

def load_mat_model(self, matfile):
    mat = scipy.io.loadmat(matfile)
    mat_layers = list(mat['layers'])[0]
    
    # copy conv weights
    for i in range(3):
        weight, bias = mat_layers[i*4]['weights'].item()[0]
        self.layers[i][0].weight.data = torch.from_numpy(np.transpose(weight, (3,2,0,1)))
        self.layers[i][0].bias.data = torch.from_numpy(bias[:,0])

I guess you are using this repo. So, in this case, when you load a pth file, say mdnet_vot-otb.pth, it loads a dictionary having only one key: 'shared_layers'. This corresponds to the layers trained offline.

Try it out:

import torch
model_weights = torch.load("mdnet_vot-otb.pth")
print(type(model_weights))
for k in model_weights: print(k)
for k in model_weights['shared_layers']: print("Shared layer", k)

It prints:

<class 'dict'>
shared_layers
Shared layer conv1.0.weight
Shared layer conv1.0.bias
Shared layer conv2.0.weight
Shared layer conv2.0.bias
Shared layer conv3.0.weight
Shared layer conv3.0.bias
Shared layer fc4.0.weight
Shared layer fc4.0.bias
Shared layer fc5.1.weight
Shared layer fc5.1.bias

Your MDNet object has a module called layers. I extracted its structure in the snippet below:

import torch
import torch.nn as nn
from collections import OrderedDict

layers = nn.Sequential(OrderedDict([
                ('conv1', nn.Sequential(nn.Conv2d(3, 96, kernel_size=7, stride=2),
                                        nn.ReLU(inplace=True),
                                        nn.LocalResponseNorm(2),
                                        nn.MaxPool2d(kernel_size=3, stride=2))),
                ('conv2', nn.Sequential(nn.Conv2d(96, 256, kernel_size=5, stride=2),
                                        nn.ReLU(inplace=True),
                                        nn.LocalResponseNorm(2),
                                        nn.MaxPool2d(kernel_size=3, stride=2))),
                ('conv3', nn.Sequential(nn.Conv2d(256, 512, kernel_size=3, stride=1),
                                        nn.ReLU(inplace=True))),
                ('fc4',   nn.Sequential(nn.Linear(512 * 3 * 3, 512),
                                        nn.ReLU(inplace=True))),
                ('fc5',   nn.Sequential(nn.Dropout(0.5),
                                        nn.Linear(512, 512),
                                        nn.ReLU(inplace=True)))]))
for k in layers.state_dict(): print("Module Layer", k)

And it prints:

Module Layer conv1.0.weight
Module Layer conv1.0.bias
Module Layer conv2.0.weight
Module Layer conv2.0.bias
Module Layer conv3.0.weight
Module Layer conv3.0.bias
Module Layer fc4.0.weight
Module Layer fc4.0.bias
Module Layer fc5.1.weight
Module Layer fc5.1.bias

It means then that the keys in the shared_layers match perfectly the keys in the module layer, that’s why this works.

Suppose I changed the fc5 linear layer to nn.Linear(512, 1024). Now If I tried to load the weights it wouldn’t work directly. Here’s a workaround:

import torch
import torch.nn as nn
from collections import OrderedDict

layers = nn.Sequential(OrderedDict([
                ('conv1', nn.Sequential(nn.Conv2d(3, 96, kernel_size=7, stride=2),
                                        nn.ReLU(inplace=True),
                                        nn.LocalResponseNorm(2),
                                        nn.MaxPool2d(kernel_size=3, stride=2))),
                ('conv2', nn.Sequential(nn.Conv2d(96, 256, kernel_size=5, stride=2),
                                        nn.ReLU(inplace=True),
                                        nn.LocalResponseNorm(2),
                                        nn.MaxPool2d(kernel_size=3, stride=2))),
                ('conv3', nn.Sequential(nn.Conv2d(256, 512, kernel_size=3, stride=1),
                                        nn.ReLU(inplace=True))),
                ('fc4',   nn.Sequential(nn.Linear(512 * 3 * 3, 512),
                                        nn.ReLU(inplace=True))),
                ('fc5',   nn.Sequential(nn.Dropout(0.5),
                                        nn.Linear(512, 1024),
                                        nn.ReLU(inplace=True)))]))

model_weights = torch.load("mdnet_vot-otb.pth")

d = model_weights['shared_layers']
d['fc5.1.weight'] = torch.randn((1024, 512)) * 0.01
d['fc5.1.bias'] = torch.zeros(1024)
layers.load_state_dict(d)
2 Likes

Interesting approach. Loading the weights, create an initialized modified layer weights, and then loading the initialized weights from that layer to the net.
In my case I need to add a new input to fc5

            ('fc4',   nn.Sequential(nn.Dropout(0.5),
                                    nn.Linear(512 * 3 * 3, 512),  
                                    nn.ReLU())),
            ('fc5',   nn.Sequential(nn.Dropout(0.5),
                                    nn.Linear(512+1, 512), 
                                    nn.ReLU()))]))

I changed the load_model function to:

def load_model(self, model_path):
    model_weights = torch.load(model_path)
    d = model_weights['shared_layers']
    d['fc5.1.weight'] = torch.randn((512, 513)) * 0.01
    d['fc5.1.bias'] = torch.zeros(512)
    self.layers.load_state_dict(d)

And it seems to work. I’ll just take a look at the weights of fc5.
Thanks a lot for your elaborate answer!

Which changes to load_model would be required if I’d want to add a new initialized conv layer (let’s call it ‘conv4’)?

You would need to change self.layers too, by inserting your conv4 in the right place. You should add two new keys to the dictionary d (say that your conv4 is something like torch.nn.Conv2d(512, 512, 3)):

d['conv4.0.weight'] = torch.randn((512, 512, 3, 3)) * 0.01
d['conv4.0.bias'] = torch.zeros(512)

Note: be sure to flag these parameters as requires_grad = True, otherwise it won’t learn.

1 Like

In that case I’ll need to change self.layers and the forward section, sure. I was just wondering in regard to changes in load_model.
Good to know that the same trick would work even if a new layer is not defined in the pretrained model. So basically I can add whatever I want and assign it to the relevant weights. Things are clearer now. Thank you :slight_smile:

1 Like

Something very odd had happened After I switched from the semi-original code to the original code. They’re pretty much the same with small changes and organized differently. I made relevant adjustments, but everything that relates to loading the file is the same (I even copied the MDNet class).

I get a very odd error message:
RuntimeError: Error(s) in loading state_dict for Sequential:
Missing key(s) in state_dict: “fc4.1.weight”, “fc4.1.bias”.
Unexpected key(s) in state_dict: “fc4.0.weight”, “fc4.0.bias”.

The network is ok, looks like this:
Module Layer conv1.0.weight
Module Layer conv1.0.bias
Module Layer conv2.0.weight
Module Layer conv2.0.bias
Module Layer conv3.0.weight
Module Layer conv3.0.bias
Module Layer fc4.1.weight
Module Layer fc4.1.bias
Module Layer fc5.1.weight
Module Layer fc5.1.bias

But oddly, d looks like this:
([‘conv1.0.weight’, ‘conv1.0.bias’, ‘conv2.0.weight’, ‘conv2.0.bias’, ‘conv3.0.weight’, ‘conv3.0.bias’, ‘fc4.0.weight’, ‘fc4.0.bias’, ‘fc5.1.weight’, ‘fc5.1.bias’])

Where did fc4.0.weight and fc4.0.bias come from? I load from the exact same .pth file, use the exact same class model, I really have no idea how is it even possible. Any idea?

In my post above, you can see that these parameters exist in the loaded dictionary (i.e. the mdnet_vot-otb.pth file).

Yeah, I see. Things is, the exact same code and same weights file result in different keys.
In the older code the net has fc4.1.bias and fc4.1.weight, and in the .pth file there’s also fc4.1.bias and fc4.1.weight in d.
In the new code the net has fc4.1.bias and fc4.1.weight, but in the .pth file there’s fc4.0.bias and fc4.0.weight in d.

Seeing that, I copied only to network and weight loader to a new file in order to see what would happen - the net has fc4.0.bias and fc4.0.weight, but in the file there’s fc4.1.bias and fc4.1.weight.

Why there’s a difference if it’s exactly the same code?

I am a beginner,and I want to change the base CNN model ,the conv1 conv2 conv3 in the MDNet, to the corresponding parts of Resnet-50).
Could you please give me any idea that how can I do that?
I really appreciate any help.
Thanks