Manually set gradient of tensor that is not being calculated automatically

I am working on a project that requires me to write a method that is not differentiable. Hence I have calculated the gradient w.r.t the input of this method. Now I have tried to use register_hook to return the calculated grad to be able to flow the gradient backwards from there. But register_hook is not being called for this tensor. After reading through a few posts here, I realized that register_hook will not be called for tensors if gradient is not being calculated. I tried to assign gradient by doing this. my_tensor.grad = calculated_grad. However, I do not think that has any effect.

Can you please help me on this? I am sharing a demo version of my code.

class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes=2):
        super(MLP, self).__init__()
        self.input_size = input_size
        self.hidden_size  = hidden_size
        self.fc1 = nn.Linear(self.input_size, self.hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(self.hidden_size,num_classes)
    def forward(self, x):
        hidden = self.fc1(x)
        relu = self.relu(hidden)
        output = self.fc2(relu)
        return output

net = MLP(input_size, hidden_size, num_classes)
out = net(data)
#used out on a non-differentiable method
final_output = non-diff(out)
calculated_grad = grad_calc(out) # calculated gradient of out with respect to non-diff method.
loss = criterion(final_output, target)
out.grad = calculated_grad

I would really appreciate your help on this. I have tried so many things to fix this. But nothing worked.

1 Like


You can use a custom Function to specify a backward for a given forward. You can see here how to do this.

1 Like

Thank you very much @albanD. I thought of this but never tried. Let me try and get back to you how that works out. :slight_smile:

I think it worked! Thank you very much @albanD. I just have a minor confusion. Do I multiply upstream gradient with the local gradient as the output of the backwards function? Or is it taken care of automatically?

You have to multiply the given grad_output with the gradient of that op to get the grad_input. This is basically one application of the chain rule.

Make sure to use the gradcheck as mentioned in the doc above to make sure your gradient formula is correct.

1 Like

@albanD I actually did that thinking of chain rule. But wanted to be sure. :slight_smile: And, I will use gradcheck as stated in the doc. Thank you again for your time.

@albanD I tried using the gradcheck. But it is returning an error.

RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0., -5000.,     0.],
        [    0., -5000.,     0.,  5000.,     0.],
        [    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0.,  5000.,  5000.]])
analytical:tensor([[-0.0346, -0.0000, -0.0000, -0.0000, -0.0000],
        [-0.0000, -0.0487, -0.0000, -0.0000, -0.0000],
        [-0.0000, -0.0000, -0.0312, -0.0000, -0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.6791,  0.0000],
        [-0.0000, -0.0000, -0.0000, -0.0000, -0.4282]])

Following your reply to this post, I have created a small input for the gradcheck function.

input = (torch.randn(5,1,requires_grad=True), torch.tensor(1), torch.randn(5,1,requires_grad=True), torch.tensor(0.7))
test = torch.autograd.gradcheck(custom_back_method, input, eps=1e-4, atol=1e-6)

I have also tried to increase eps as I am using single precision following your suggestion from this post. Please note, I also have tried double precision, which yields similar results. If I set both eps and atol to a very large value like 1.0, then gradcheck yields true. I think that is simply because of higher tolerance and high eps.

Now it seems to me that the analytical gradient makes more sense than the numerical one. Numerical gradient is too large and does not maintain the Identity, which as far as I understand, should. Do you have any suggestions for me?

You will need your function to be quite smooth for the numerical gradient to be precision.
If you provide double precision Tensors using the default orther args to gradcheck, it should pass.

If you get 5000 for the numerical gradient, that would mean that your function is absolutely not smooth!

Yes, that is correct. Output of my custom_back_method is infact binary (0 or 1). In this case, is it not possible to get precise numerical gradient? This gradient value (5000) is highly dependent on the eps value (1e-4).

If your forward has a binary output. Then the “true” gradient will be 0 almost everywhere. So you won’t be able to use finite difference to check the gradients.

1 Like

Yes, that is the case for me. Thank you. I have manually calculated the grad and they seem right in this case. Thank you for your time and help! I really appreciate it. :slight_smile:

1 Like