How to cancel gradient at some specific weights and bias terms in a FC layer

Daniel_Dagnino · March 20, 2019, 4:03pm

Hi all,
Maybe someone can clarify my following doubt. I want to add new labels to have a model that I have previously trained, let’s say a resnet with 100 classes. Now, I want to add 10 new classes but I want to check the performance only training the net over the weights and bias of this new 10 classes in the last FC layer. So I have copied the CNN layers as follows:

        for key_old, value_old in dict(model_old.named_parameters()).items():
            if not ('fc_final' in key_old):
                for key, value in dict(self.named_parameters()).items():
                    if key == key_old:
            if value.dim() == 2:
                            value.data[:self.num_classes_old, :] = value_old.data
                            value.requires_grad = False
            elif value.dim() == 1:
                            value.data[:self.num_classes_old] = value_old.data
                            value.requires_grad = False

where _old refers to the previously trained model and no _old the new one. And the last layer, the FC layers has been previously called fc_final. dim 1 and 2 is used to choose between the weights and the bias terms.

For the FC I do something similar but just copying the old terms and allowing gradient:

        for key_old, value_old in dict(model_old.named_parameters()).items():
            if not ('fc_final' in key_old):
                for key, value in dict(self.named_parameters()).items():
                    if key == key_old:
            if value.dim() == 2:
                            value.data[:self.num_classes_old, :] = value_old.data
                            value.requires_grad = True
            elif value.dim() == 1:
                            value.data[:self.num_classes_old] = value_old.data
                            value.requires_grad = True

Here, self.num_classes_old is the number of classes in the previously trained model.
Up to here, there is no problem, everything is working. However, my doubt raises now, I am wondering if there is an option to make required grad False for the previous weights and bias terms and True for the new ones, i.e.
value.data[:self.num_classes_old, :] have requires_grad = False
value.data[self.num_classes_old:, :] have requires_grad = True

Some more info, following the comments at:

github.com/pytorch/pytorch

can't change requires_grad for non leaf variables

opened 01:23AM - 25 Oct 16 UTC

closed 09:47PM - 31 Oct 16 UTC

glample

high priority

In my code I have a target `y` which is computed using some Linear and Conv2d mo…dules, and I can't do `y.requires_grad = False` before I pass it to a loss module. Problem comes from: https://github.com/pytorch/pytorch/blob/93d02e4686ef4732062964df01b8698e457897a7/torch/autograd/variable.py#L39-L41 Is there any particular reason why this restriction was implemented? Also I was wondering what would happen if we have something like: x -> var1 -> var2 -> var3 -> var_loss and we call loss.backward() when every variable requires a gradient, apart from var2, for instance? Would it be like if var1.requires_grad is also set to False?

I have tried to use detach() and I have changed the
value[:self.num_classes_old] for value[:self.num_classes_old].detach()
however, checking the weights and bias terms I see that they change after training so I do not know why but the detachment is not working. Am I missing something? In fact, I do not understand how detach exactly works, I just know that stops the gradient propagation.

Finally, some other question, from:

I have seen that the use of data is not recommended. Does anybody know why? I am using it a lot and I would like to understand why is not recommended.

Thanks in advance.

MariosOreo · March 21, 2019, 8:29am

Hello,

.detach() is safe but .data is not.
Autograd engine will detect on .detach() options but will not pay attention to .data, so if variable is needed in backward() pass, change variable value and will get incorrect answer and there will not any error raised.

Daniel_Dagnino · March 21, 2019, 8:33pm

Thanks for the clarification. According to that, I think it is not a problem to copy the weights and bias terms using .data. The problem appears if I want to use .data to copy parameters in the model because autograd will not propagate correctly towards the variable that has been copied.

However, still I have the doubt whether is possible to cancel the gradient for one specific weight inside a FC layer (or any other kind of layer in fact) or not. I see that if I cancel the gradient using one specific weight, let’s say
value.data[0].requires_grad = False
it seems (just looking the code line) that I have cancelled only the gradient just for this weight. However, after this operation, all the rest of the weights also appear with requires_grad False.
print(value.data[1].requires_grad)
returns False even though they were initially set as True. So, I conclude that it is not possible, which in fact seems reasonable but I find a little bit confusing that I can do
value.data[0].requires_grad = False
without any warning.

MariosOreo · March 22, 2019, 12:40am

I got your confusion. The question this thread discussed is similar to you.
What you want is to set slices of layer.weight.requires_grad=False, I think it should be done manually as mentioned in the link above.
And for .data, we can use with torch.no_grad(): to avoid the unsafe mode.

Daniel_Dagnino · March 22, 2019, 8:15am

Great, now it is clear, thanks for the comments and the link.