Gradients for model parameters could be accessed directly (e.g. self.conv1.weight.grad). What about gradients for activations?
I use ReLU activations, so I technically I could use gradients for biases. The problem is I don’t use biases in my network. So, short of computing the entire chain of gradients manually, is there a way to get them from autograd?
Why do I need them? I want to use a learnable threshold for ReLU clipping:
relu1 = torch.where(relu1 > thr, thr, relu1)
where thr is a trainable model parameter. The threshold function is not differentiable, so I want to estimate its gradient from the gradients of the activations. The gradient for thr should be proportional to the sum of gradients for all activations.
I also tried print(torch.autograd.grad(loss, relu1)) after loss.backward(retain_graph=True) and that works, but if I understand this correctly, it repeats the backward pass, so retain_grad() method should be more efficient, right?
The “.grad” field is only populated when you call .backward(). Just after creation of the Tensor it will always be None.
You will need to save the “relu1” value in some way. Like in self. or return it or in a global. And then print relu1.grad after calling backward().
I am stuck at the same problem. Could you please explain how did you accessed the gradients after calling loss.backward() (or if possible, share your script) ? It would be of great help.
@albanD Sir, I have tried saving the gradients of ReLU using a variable in forward function (shown below). However, when I print it after calling model(), it prints None, as you mentioned. However, when I print it after calling loss.backward(), I get nothing.