Manually set gradient of tensor that is not being calculated automatically

ksarker1 · April 20, 2020, 10:47pm

I am working on a project that requires me to write a method that is not differentiable. Hence I have calculated the gradient w.r.t the input of this method. Now I have tried to use register_hook to return the calculated grad to be able to flow the gradient backwards from there. But register_hook is not being called for this tensor. After reading through a few posts here, I realized that register_hook will not be called for tensors if gradient is not being calculated. I tried to assign gradient by doing this. my_tensor.grad = calculated_grad. However, I do not think that has any effect.

Can you please help me on this? I am sharing a demo version of my code.

class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes=2):
        super(MLP, self).__init__()
        self.input_size = input_size
        self.hidden_size  = hidden_size
        self.fc1 = nn.Linear(self.input_size, self.hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(self.hidden_size,num_classes)
    def forward(self, x):
        hidden = self.fc1(x)
        relu = self.relu(hidden)
        output = self.fc2(relu)
        return output

net = MLP(input_size, hidden_size, num_classes)
out = net(data)
#used out on a non-differentiable method
final_output = non-diff(out)
calculated_grad = grad_calc(out) # calculated gradient of out with respect to non-diff method.
loss = criterion(final_output, target)
loss.backward()
out.grad = calculated_grad
opt.step()

I would really appreciate your help on this. I have tried so many things to fix this. But nothing worked.

albanD · April 21, 2020, 3:13pm

Hi,

You can use a custom Function to specify a backward for a given forward. You can see here how to do this.

ksarker1 · April 23, 2020, 4:17pm

Thank you very much @albanD. I thought of this but never tried. Let me try and get back to you how that works out.

ksarker1 · April 23, 2020, 9:35pm

I think it worked! Thank you very much @albanD. I just have a minor confusion. Do I multiply upstream gradient with the local gradient as the output of the backwards function? Or is it taken care of automatically?

albanD · April 23, 2020, 10:27pm

You have to multiply the given grad_output with the gradient of that op to get the grad_input. This is basically one application of the chain rule.

Make sure to use the gradcheck as mentioned in the doc above to make sure your gradient formula is correct.

ksarker1 · April 23, 2020, 11:49pm

@albanD I actually did that thinking of chain rule. But wanted to be sure. And, I will use gradcheck as stated in the doc. Thank you again for your time.

ksarker1 · April 25, 2020, 6:01pm

@albanD I tried using the gradcheck. But it is returning an error.

RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0., -5000.,     0.],
        [    0., -5000.,     0.,  5000.,     0.],
        [    0.,     0.,     0.,     0.,     0.],
        [    0.,     0.,     0.,  5000.,  5000.]])
analytical:tensor([[-0.0346, -0.0000, -0.0000, -0.0000, -0.0000],
        [-0.0000, -0.0487, -0.0000, -0.0000, -0.0000],
        [-0.0000, -0.0000, -0.0312, -0.0000, -0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.6791,  0.0000],
        [-0.0000, -0.0000, -0.0000, -0.0000, -0.4282]])

Following your reply to this post, I have created a small input for the gradcheck function.

input = (torch.randn(5,1,requires_grad=True), torch.tensor(1), torch.randn(5,1,requires_grad=True), torch.tensor(0.7))
test = torch.autograd.gradcheck(custom_back_method, input, eps=1e-4, atol=1e-6)
print(test)

I have also tried to increase eps as I am using single precision following your suggestion from this post. Please note, I also have tried double precision, which yields similar results. If I set both eps and atol to a very large value like 1.0, then gradcheck yields true. I think that is simply because of higher tolerance and high eps.

Now it seems to me that the analytical gradient makes more sense than the numerical one. Numerical gradient is too large and does not maintain the Identity, which as far as I understand, should. Do you have any suggestions for me?

albanD · April 27, 2020, 7:59pm

You will need your function to be quite smooth for the numerical gradient to be precision.
If you provide double precision Tensors using the default orther args to gradcheck, it should pass.

If you get 5000 for the numerical gradient, that would mean that your function is absolutely not smooth!

ksarker1 · April 29, 2020, 3:31pm

Yes, that is correct. Output of my custom_back_method is infact binary (0 or 1). In this case, is it not possible to get precise numerical gradient? This gradient value (5000) is highly dependent on the eps value (1e-4).

albanD · April 29, 2020, 4:36pm

If your forward has a binary output. Then the “true” gradient will be 0 almost everywhere. So you won’t be able to use finite difference to check the gradients.

ksarker1 · May 18, 2020, 10:03pm

Yes, that is the case for me. Thank you. I have manually calculated the grad and they seem right in this case. Thank you for your time and help! I really appreciate it.