Freezing a network problem

lifeblack · December 12, 2020, 2:18pm

Hi I want to copy weights of a network to another network and freeze that weights and the second network just updates the layers that differs from first networks(second network has some layers exactly the same as first network and some different layers and I want to copy the weights of first network in similar layers and freeze them and then train the second network to find different layers’ weights).
copy:

def weights_init_receive(m):
    model2.conv1=copy.deepcopy(model.conv1)
    model2.conv2=copy.deepcopy(model.conv2)
    model2.conv3=copy.deepcopy(model.conv3)

require grad:

for name, param in model2.named_parameters():
  if param.requires_grad and 'conv1' in name:
    param.requires_grad = False
  if param.requires_grad and 'conv2' in name:
    param.requires_grad = False
  if param.requires_grad and 'conv3' in name:
    param.requires_grad = False

set optimizer:

torch.optim.Adam(filter(lambda p: p.requires_grad, model2.parameters()), lr=0.1)
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.1
    weight_decay: 0
)

now my problem is this!!!
I have 3 conv layer that their requires_grad=True
new layers does not change at all…
how could I fix it?

and I got this error after loss.backward()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

matheo_r · December 13, 2020, 10:00am

Hi @lifeblack

Is the ‘element 0 of tensors’ related to a freezed conv layer or to a trainable conv layer ?

lifeblack · December 13, 2020, 1:33pm

Hi @matheo_r thanks for reply… actually I don’t know ‘element 0 of tensors’ refers to what!!!
if it means the first layer of my network, its the freezed one…
this is my network:
conv1 freezed
conv2 freezed
conv3 freezed
conv4 trainable
conv5 trainable
conv6 trainable
fully connected freezed
fully connected freezed

InnovArul · December 13, 2020, 4:22pm

I am unable to reproduce the behavior that you are mentioning.

gist.github.com

https://gist.github.com/InnovArul/f5365f8986c0200e63c157ad06e794db

freeze_test.py

import torch, torch.nn as nn
import torch.nn.functional as F
import os, sys
import copy
import torch.optim as optim

class ConvReLU(nn.Module):
    def __init__(self, indim, outdim):
        super().__init__()
        self.conv = nn.Conv2d(indim, outdim, kernel_size=1)

This file has been truncated. show original

matheo_r · December 13, 2020, 4:51pm

It seems it refers to conv1 but I couldn’t reproduce the behavior either. Could you share the whole code ?

lifeblack · December 13, 2020, 5:27pm

Hi @InnovArul @matheo_r … thanks a lottttttttt for your reply…
My code is sth like bellow…

import torch, torch.nn as nn
import torch.nn.functional as F
import os, sys
import copy
import torch.optim as optim

class ConvReLU(nn.Module):
    def __init__(self, indim, outdim):
        super().__init__()
        self.conv = nn.Conv2d(indim, outdim, kernel_size=1)

    def forward(self, x):
        return F.relu(self.conv(x))

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = ConvReLU(3,6)
        self.conv2 = ConvReLU(6,12)
        self.conv3 = ConvReLU(12,18)
        self.conv4 = ConvReLU(18,18)
        self.conv5 = ConvReLU(18,18)
        self.conv6 = ConvReLU(18,18)
        self.fc1 = nn.Linear(25*18, 100)
        self.fc2 = nn.Linear(100, 10)

    def forward(self, x):
        y = copy.deepcopy(x)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = F.relu(self.fc1(x.view(x.shape[0], -1)))
        x = self.fc2(x)
        return x


def freeze_params(model):
    for name, param in model2.named_parameters():
        if param.requires_grad and 'conv1' in name:
            param.requires_grad = False
        if param.requires_grad and 'conv2' in name:
            param.requires_grad = False
        if param.requires_grad and 'conv3' in name:
            param.requires_grad = False
        if param.requires_grad and 'fc1' in name:
            param.requires_grad = False
        if param.requires_grad and 'fc2' in name:
            param.requires_grad = False

def copy_weights(src, dst):
    dst.conv1=copy.deepcopy(src.conv1)
    dst.conv2=copy.deepcopy(src.conv2)
    dst.conv3=copy.deepcopy(src.conv3)

def my_loss(x,y):
    loss =torch.norm(x-y,2)
    return loss

if __name__ == "__main__":
    data = torch.randn(2,3,5,5).cuda()
    y = torch.randn(2,3,5,5).cuda()
    model1 = CNN().cuda()
    model2 = CNN().cuda()

    copy_weights(model1, model2)
    assert model1.conv1.conv.weight.data_ptr() != model2.conv1.conv.weight.data_ptr()
    assert model1.conv2.conv.weight.data_ptr() != model2.conv2.conv.weight.data_ptr()
    assert model1.conv3.conv.weight.data_ptr() != model2.conv3.conv.weight.data_ptr()

    freeze_params(model2)

    opt = torch.optim.Adam(filter(lambda p: p.requires_grad, model2.parameters()), lr=0.01)
    opt.zero_grad()
    out = model2(data)
    loss = my_loss(data,y)
    loss.backward()
    opt.step()

and the error is:

RuntimeError                              Traceback (most recent call last)
<ipython-input-12-c82715d18b62> in <module>()
     77     out = model2(data)
     78     loss = my_loss(data,y)
---> 79     loss.backward()
     80     opt.step()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Would you mind helping me solve this?

raufbhat-dev · December 13, 2020, 5:34pm

Should the loss be loss = my_loss(out,y) instead of loss = my_loss(data,y)

InnovArul · December 13, 2020, 5:35pm

You are not using model’s output in loss function.

lifeblack · December 13, 2020, 6:23pm

thanks @raufbhat-dev … I changed the last fc layer require_grad to True and the error fixed… I dont know what is the difference between the code I post here an the actual code but this loss = my_loss(out,y) is correct in that and the problem was because of the last layer…
Is there any way to set last layer’s require_grad =False and do not face that problem?

InnovArul · December 13, 2020, 7:34pm

Interestingly, I am unable to reproduce this as well. i.e., Even if I set the last layer’s requires_grad = False, there is no error. You can find this in the same code that I shared earlier.

gist.github.com

https://gist.github.com/InnovArul/f5365f8986c0200e63c157ad06e794db

freeze_test.py

import torch, torch.nn as nn
import torch.nn.functional as F
import os, sys
import copy
import torch.optim as optim

class ConvReLU(nn.Module):
    def __init__(self, indim, outdim):
        super().__init__()
        self.conv = nn.Conv2d(indim, outdim, kernel_size=1)

This file has been truncated. show original

lifeblack · December 13, 2020, 9:35pm

Yes the code that you sent, is correct. But the code that I wrote, still have faced that error. I change the require_grad of last layer and it fixed…
@InnovArul Could you please tell the purpose of this part?

copy_weights(model1, model2)
    assert model1.conv1.conv.weight.data_ptr() != model2.conv1.conv.weight.data_ptr()
    assert model1.conv2.conv.weight.data_ptr() != model2.conv2.conv.weight.data_ptr()
    assert model1.conv3.conv.weight.data_ptr() != model2.conv3.conv.weight.data_ptr()

I don’t use it in my code…

InnovArul · December 14, 2020, 7:53am

Strange though. Do you use the latest version of pytorch?
I use torch 1.6.0 and do not observe this error while running your code.

It is just to make sure that the conv layer’s weights are different between the models. They do not interfere in model forward or model design.