Differentiation of a constant with autograd.grad?

Why is the last grad() below raising an error “RuntimeError: differentiated input is unreachable” ?

from torch import Tensor
from torch.autograd import Variable
from torch.autograd import grad

x = Variable(Tensor([5]), requires_grad = True)
print('x', x)

f = x[0] * x[0] * x[0] - 3 * x[0]
print('f = x^3 - 3x', f)

f1 = grad(f, x, create_graph = True)[0]
print('f1 = 3x^2 - 3', f1)

f2 = grad(f1, x, create_graph = True)[0]
print('f2 = 6x', f2)

f3 = grad(f2, x, create_graph = True)[0]
print('f3 = 6', f3)

f4 = grad(f3, x, create_graph = True)[0]
print('f4 = 0', f4)

This is because the graph associated to f3 as you stated outputs a constant. This means that it is independent of the input x.
The call to grad with f3 thus fails to find x in the graph of f3.
When you call grad, the second argument (here x) is the input for which you want to gradient for. In your case, this input is not in the graph, so you cannot get the gradients wrt to it.

1 Like

I understand this. My “why” was more along the line of “how comes this is the chosen behavior of .grad() instead of simply returning zero when the variable does not appear in the graph?”

I would guess that in the pure NN case, having the output independent of the considered input, it is most likely a bug.
Though I agree that in a more general sense it makes sense to return zeros.

@apaszke may have a better explanation?

Even for NN. I could imagine the same piece of code being fed a network or a sub-part of a network and wanting to use 2nd order derivatives in all cases, for instance in a penalty term, and not wanting to care about the said penalty being zero or not for some of the parameters.

Hi, albanD, Could I ask a question please? Based on the FrancoisFleuret’s example, I defined gradients = torch.FloatTensor([0.1]), and f3 = grad(f2, x, gradients, create_graph = True)[0], then the output is 0.6.
but, shouldn’t it is 6? the derivative of f2 w.r.t. x is the constant 6, no matter what the value of x is. Could you give me some instruction please?

If I understand correctly, in your case, you backpropagate in the graph corresponding to f2, with an initial gradient of 0.1 instead of 1. Since the gradient computation formulate for f2 is actually a multiplicative function of the output, you get 0.6 instead of 6. It is the output of f3 that is going to be a constant, not the one of f2.

1 Like