Copying the weights from one model to another

Vishak_Raj · May 31, 2022, 1:01pm

Hi,

I am experiencing this situation,

I trained a model named src_model using resnet18, and I want to use the first four layer and its weight in another model dest_model, as it is.

I saved the src_model using torch.save()

In the dest_model, I used to create by taking the first four layers from the src_model which is the resnet18 with fc layers at last.

    class dest_model(nn.Module):
        def __init__(self):
            super(dest_model, self).__init__()
            resnet = models.resnet18(pretrained=True)
            layers = list(resnet.children())
            self.features = nn.Sequential(*layers[:4])

        def forward(self, x):
            return self.features(x)
    print(dest_model)

dest_model(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
…

The first four layer from src_model and dest_model are same
and transferred its weights using this format -

dest_model.features[0].weight.data=src_model.features[0].weight.data.to(device).type(torch.cuda.FloatTensor)
dest_model.features[1].weight.data=src_model.features[1].weight.data.to(device).type(torch.cuda.FloatTensor)
dest_model.features[1].bias.data=src_model.features[1].bias.data.to(device).type(torch.cuda.FloatTensor)

and I saved the dest_model using torch.save(), ever thing works smoothly

But, the results from dest_model is different. so, I manually checked the outputs of the intermediate layers using forward hook by referring this link.

I gave the same input sample to the src_model and dest_model and took the first four layers output using the forward hook from both models, to get to know what is happening,

for src_key, dest_key in zip(src_activation, dest_activation):
    print(src_key, dest_key)
    print(torch.equal(src_activation[src_key], dest_activation[dest_key]))
    print(src_activation[src_key].shape, dest_activation[dest_key].shape)

src_activation, dest_activation are the Ordered_dict output from forward hook

Here is the result -

features.0 features.0
True
torch.Size([1, 64, 112, 112]) torch.Size([1, 64, 112, 112])
features.1 features.1
False
torch.Size([1, 64, 112, 112]) torch.Size([1, 64, 112, 112])
features.2 features.2
False
torch.Size([1, 64, 112, 112]) torch.Size([1, 64, 112, 112])
features.3 features.3
False
torch.Size([1, 64, 56, 56]) torch.Size([1, 64, 56, 56])

From this results, I get to know that the 1st layers has same output and the rest of the layers have different results even though they have same weights,

However, as I copied the weights, the results should be same in all the layers from both the models. But, the results are not same for same input.
The result gets differ from the 2nd layer which is BatchNorm2d,

Could you please explain, What is happening in the BatchNorm2d layer, why, it behaves differently in both models even though, it has same parameters

Thanks

ptrblck · May 31, 2022, 4:39pm

How large are the absolute errors in these layers?
Since you are dealing with floating point numbers using torch.equal doesn’t sound like a good idea as you might run into small errors caused by the limited floating point precision.

Vishak_Raj · May 31, 2022, 7:14pm

hello @ptrblck, thanks for writting,

I calculated the mean absolute error between the output of layers from both models
the result -

features.0 : mae - 0
features.1 : mae - 0.1092
features.2 : mae - 0.1092
features.3 : mae - 0.1259

I am wondering that this difference is not good, because due to this variation i get different results at the end of 4th layer

thanks

ptrblck · May 31, 2022, 11:11pm

These errors look a bit high so could you post a minimal, executable code snippet to reproduce the issue, please?

Vishak_Raj · June 1, 2022, 7:04am

@ptrblck, Here is the code

class src_model1(nn.Module):
    def __init__(self):
        super(src_model1, self).__init__()
        resnet = models.resnet18(pretrained=True)
        layers = list(resnet.children())
        self.features = nn.Sequential(*layers[:4])

    def forward(self, x):
        return 

class dest_model1(nn.Module):
        def __init__(self):
            super(dest_model1, self).__init__()
            resnet = models.resnet18(pretrained=True)
            layers = list(resnet.children())
            self.features = nn.Sequential(*layers[:4])

        def forward(self, x):
            return self.features(x)

src_model = src_model1()
src_model = torch.load('src_model.pth')

dest_model = dest_model1()

dest_model.features[0].weight.data=src_model.features[0].weight.data.to(device).type(torch.cuda.FloatTensor)
dest_model.features[1].weight.data=src_model.features[1].weight.data.to(device).type(torch.cuda.FloatTensor)
dest_model.features[1].bias.data=src_model.features[1].bias.data.to(device).type(torch.cuda.FloatTensor)

activation = OrderedDict()
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

for name, layer in src_model.named_modules():
    layer.register_forward_hook(get_activation(name))

output = src_model.forward(samples) # only one sample is given as input
 
activation1 = OrderedDict()
def get_activation1(name):
    def hook1(model, input, output):
        activation1[name] = output.detach()
    return hook1

for name, layer in dest_model.named_modules():
    layer.register_forward_hook(get_activation1(name))

output1 = dest_model.forward(samples) # only one sample is given as input which is same input given to src_model

mae = nn.L1Loss()

for key, key1 in zip(activation, activation1):
    print(key, key1)
    print(torch.equal(activation[key], activation1[key1]), mae(activation[key], activation1[key1]))
    print(activation[key].shape, activation1[key1].shape)

This is the flow, I used,

thanks

ptrblck · June 1, 2022, 7:14am

Thanks for the code.
A few issues:

your src_model1 is not executing anything and returns None
don’t use model.forward as it could skip hooks but call the model instead directly
don’t use the .data attribute as it’s deprecated and could easily break things
seamples is undefined so I used a random tensor

After fixing these issues, your code works fine:

class src_model1(nn.Module):
    def __init__(self):
        super(src_model1, self).__init__()
        resnet = models.resnet18(pretrained=True)
        layers = list(resnet.children())
        self.features = nn.Sequential(*layers[:4])

    def forward(self, x):
        return self.features(x)

class dest_model1(nn.Module):
        def __init__(self):
            super(dest_model1, self).__init__()
            resnet = models.resnet18(pretrained=True)
            layers = list(resnet.children())
            self.features = nn.Sequential(*layers[:4])

        def forward(self, x):
            return self.features(x)

src_model = src_model1().cuda()
dest_model = dest_model1().cuda()

# not needed as you are using the same pretrained models
#with torch.no_grad():
#    dest_model.features[0].weight.copy_(src_model.features[0].weight)
#    dest_model.features[1].weight.copy_(src_model.features[1].weight)
#    dest_model.features[1].bias.copy_(src_model.features[1].bias)

activation = OrderedDict()
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

for name, layer in src_model.named_modules():
    layer.register_forward_hook(get_activation(name))

samples = torch.randn(1, 3, 224, 224).cuda()
output = src_model(samples) 
 
activation1 = OrderedDict()
def get_activation1(name):
    def hook1(model, input, output):
        activation1[name] = output.detach()
    return hook1

for name, layer in dest_model.named_modules():
    layer.register_forward_hook(get_activation1(name))

output1 = dest_model(samples) 

mae = nn.L1Loss()

for key, key1 in zip(activation, activation1):
    print(key, key1)
    print(torch.equal(activation[key], activation1[key1]), mae(activation[key], activation1[key1]))
    print(activation[key].shape, activation1[key1].shape)

Output:

features.0 features.0
True tensor(0., device='cuda:0')
torch.Size([1, 64, 112, 112]) torch.Size([1, 64, 112, 112])
features.1 features.1
True tensor(0., device='cuda:0')
torch.Size([1, 64, 112, 112]) torch.Size([1, 64, 112, 112])
features.2 features.2
True tensor(0., device='cuda:0')
torch.Size([1, 64, 112, 112]) torch.Size([1, 64, 112, 112])
features.3 features.3
True tensor(0., device='cuda:0')
torch.Size([1, 64, 56, 56]) torch.Size([1, 64, 56, 56])
features features
True tensor(0., device='cuda:0')
torch.Size([1, 64, 56, 56]) torch.Size([1, 64, 56, 56])
 
True tensor(0., device='cuda:0')
torch.Size([1, 64, 56, 56]) torch.Size([1, 64, 56, 56])

Vishak_Raj · June 1, 2022, 9:32am

@ptrblck, actually, the src_model is trained model, the weights from the src_model copied to the weights of the dest_model but only for the first four layers.

I missed this line in the above code, src_model = torch.load('src_model.pth')

And, samples is the input image, I removed the .data attribute while transferring the weights and predict the sample using model(sample), the same issue happens again…

ptrblck · June 1, 2022, 5:15pm

OK, could you then update your code snippet (or post a new one) which I can use to reproduce the issue, please?

Vishak_Raj · June 1, 2022, 6:15pm

@ptrblck, I already update the code, I just missed this one line,

src_model = torch.load('src_model.pth')

the src_model is trained with different dataset, I just want to use the first four layers of src_model, in the dest_model, for that, I transferred the weights from src_model to dest_model. when I checked both models with same input image, the first four layers should return the same output, but in this flow, I get different results

could you please help me to solve this issue, thanks