ERROR IN setting a conv filter in a particular layer as 0

ANKUR_GUPTA1 · March 1, 2020, 10:01am

class my_model(nn.Module):
  def __init__(self):
    super(my_model,self).__init__()
    self.conv1 = nn.Conv2d(3,16,kernel_size=3,stride=1,padding=1)
    self.conv2 = nn.Conv2d(16,32,kernel_size=3,stride=1,padding=1)
    self.conv3 = nn.Conv2d(32,64,kernel_size=3,stride=1,padding=1)
    self.pool = nn.MaxPool2d(2, 2)
    self.fc1 = nn.Linear(4*4*64,64)
    self.fc2 = nn.Linear(64,10)
  def forward(self,inp):
    ab = self.pool(F.relu(self.conv1(inp)))
    ab = self.pool(F.relu(self.conv2(ab)))
    ab = self.pool(F.relu(self.conv3(ab)))
    ab = ab.view(ab.shape[0],-1)
    ab = F.relu(self.fc1(ab))
    ab = F.relu(self.fc2(ab))
    return ab
    
test_model = my_model()
test_model.cuda()

I trained my model and saved it and loaded it

new_test_model = my_model()
a = torch.load("/content/cifar_net.pth")

new_test_model.load_state_dict(a)

> <all keys matched>

after that I set one filter as zero

new_test_model.conv1.weight[0]=torch.zeros_like(new_test_model.conv1.weight[0].data)

and wanted to check that after training is the filter revived
so I initialized my optimizer again
but am getting this error

ValueError: can’t optimize a non-leaf Tensor

I am not able to understand why
Please Help Me

ptrblck · March 2, 2020, 1:18am

Wrapping the operation into a torch.no_grad() block and using copy_ should work:

with torch.no_grad():
    model.conv1.weight[0].copy_(torch.zeros_like(model.conv1.weight[0]))

Also, don’t use the .data attribute, as it might cause unwanted side effects.

ANKUR_GUPTA1 · March 3, 2020, 9:42am

is there any way by which I can set the filters as zero and make them such that they are not revived.
I actually want to remove the filters.

ptrblck · March 3, 2020, 2:06pm

If you want so set some filters as zeros, you could either remove them from your model completely or alternatively multiply your filters with a mask containing 1s for the valid filters and 0s for the filters you want to zero out, and apply the same mask to the gradients of these filters.

ANKUR_GUPTA1 · March 3, 2020, 4:00pm

can i use something internally so that the gradient is not computed by itself
i tried using

new_test_model.conv1.weight[0].requires_grad = False

but got

RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn’t require differentiation use var_no_grad = var.detach().

Then I tried using

new_test_model.conv1.weight[0] = new_test_model.conv1.weight[0].detach()

but then I am getting

ptrblck · March 3, 2020, 4:25pm

No, you can set the requires_grad attribute for the entire tensor, not just a specific part of it.

ANKUR_GUPTA1 · March 6, 2020, 8:27am

Sir This is my model
I want to fix some filters to zero and don’t want them to update when the model learns
so I want to make their gradients as zero as well

class my_model(nn.Module):
  def __init__(self):
    super(my_model,self).__init__()
    self.conv1 = nn.Conv2d(3,16,kernel_size=3,stride=1,padding=1)
    self.conv2 = nn.Conv2d(16,32,kernel_size=3,stride=1,padding=1)
    self.conv3 = nn.Conv2d(32,64,kernel_size=3,stride=1,padding=1)
    self.pool = nn.MaxPool2d(2, 2)
    self.fc1 = nn.Linear(4*4*64,64)
    self.fc2 = nn.Linear(64,10)
  def forward(self,inp):
    ab = self.pool(F.relu(self.conv1(inp)))
    ab = self.pool(F.relu(self.conv2(ab)))
    ab = self.pool(F.relu(self.conv3(ab)))
    ab = ab.view(ab.shape[0],-1)
    ab = F.relu(self.fc1(ab))
    ab = F.relu(self.fc2(ab))
    return ab

> I loaded the model from a previous checkpoint

new_test_model = my_model()
a = torch.load("/content/cifar_net.pth")

new_test_model.load_state_dict(a)

new_test_model.cuda()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(new_test_model.parameters(), lr=0.001)

> I executed once to check if the model was updating weights or not

optimizer.zero_grad()
inputs, labels = iter(trainloader).next()
inputs, labels = inputs.cuda(), labels.cuda()
outputs = new_test_model(inputs)
loss = criterion(outputs, labels)
print("before loss backward")
loss.backward()
print("after loss backward")
optimizer.step()
print("after opt step")

> I want to make the weights of these layers as zero according to the dictionary below

indices = {
    'conv1':[0,3,9,5,7,2],
    'conv2':[10,13,19,25,17,22],
    'conv3':[20,31,29,15,27,12],
}

As you suggested I used this code to make the filters zero

# code to make the filters zero
for index, item in enumerate(new_test_model.named_children()):
  with torch.no_grad():
    if(isinstance(item[1],nn.Conv2d)):
      wt_list = indices[item[0]]
      for i in wt_list:
        item[1].weight[i].copy_(torch.zeros_like(item[1].weight[i]))
        item[1].bias[i].copy_(torch.zeros_like(item[1].bias[i]))

Then I checked the weights, they were zero, as I wanted them to be

# code to check the filters if they turned to zero
for index, item in enumerate(new_test_model.named_children()):
  with torch.no_grad():
    if(isinstance(item[1],nn.Conv2d)):
      wt_list = indices[item[0]]
      for i in wt_list:
        print(item[1].weight[i])

Then I executed the code below to make the gradients again

optimizer.zero_grad()

inputs, labels = iter(trainloader).next()

inputs, labels = inputs.cuda(), labels.cuda()

outputs = new_test_model(inputs)

loss = criterion(outputs, labels)

loss.backward()

And then I checked the gradients

for index, item in enumerate(new_test_model.named_children()):
    if(isinstance(item[1],nn.Conv2d)):
      wt_list = indices[item[0]]
      for i in wt_list:
        print(item[1].weight.grad[i,:,:,:])

They turned out to be non zero as expected
because till now I have not made them as zero

Then I turned the gradients to be zero using the code below

print("after loss backward")
for index, item in enumerate(new_test_model.named_children()):
  if(isinstance(item[1],nn.Conv2d)):
    wt_list = indices[item[0]]
    for i in wt_list:
      item[1].weight.grad[i,:,:,:]=0
      item[1].bias.grad[i]=0

And checked the gradients again and they were zero as expected

so when the gradients were Zero and the filters were themselves Zero there should be NO update in the filters but the filters were updated

The filters from being zero despite the gradients being zero got updated to non zero.

and to add to this even after I did optimizer.zero_grad()
without calculating the loss again and executing loss.backward()
just to check that even when all gradients are zero if then also the model is updated,
then also using just optimizer.zero_grad() and then optimizer.step()
the model got updated again.

Can you please explain what is happening or what am i missing or doing wrong.

I just want to make the filters zero and don’t want them to update

ptrblck · March 7, 2020, 3:01am

If these parameters were updated in the past and if you are using an optimizer with internal running states for all parameters (e.g. Adam), this behavior is expected.
Here is a small example.

Could this be the case, i.e. are you training these parameters and freezing them after some iterations?