Why is the last grad() below raising an error “RuntimeError: differentiated input is unreachable” ?

``````from torch import Tensor

x = Variable(Tensor([5]), requires_grad = True)
print('x', x)

f = x[0] * x[0] * x[0] - 3 * x[0]
print('f = x^3 - 3x', f)

f1 = grad(f, x, create_graph = True)[0]
print('f1 = 3x^2 - 3', f1)

f2 = grad(f1, x, create_graph = True)[0]
print('f2 = 6x', f2)

f3 = grad(f2, x, create_graph = True)[0]
print('f3 = 6', f3)

f4 = grad(f3, x, create_graph = True)[0]
print('f4 = 0', f4)``````

This is because the graph associated to `f3` as you stated outputs a constant. This means that it is independent of the input `x`.
The call to `grad` with `f3` thus fails to find x in the graph of `f3`.
When you call grad, the second argument (here x) is the input for which you want to gradient for. In your case, this input is not in the graph, so you cannot get the gradients wrt to it.

1 Like

I understand this. My “why” was more along the line of “how comes this is the chosen behavior of .grad() instead of simply returning zero when the variable does not appear in the graph?”

I would guess that in the pure NN case, having the output independent of the considered input, it is most likely a bug.
Though I agree that in a more general sense it makes sense to return zeros.

@apaszke may have a better explanation?

Even for NN. I could imagine the same piece of code being fed a network or a sub-part of a network and wanting to use 2nd order derivatives in all cases, for instance in a penalty term, and not wanting to care about the said penalty being zero or not for some of the parameters.

Hi, albanD, Could I ask a question please? Based on the FrancoisFleuret’s example, I defined gradients = torch.FloatTensor([0.1]), and f3 = grad(f2, x, gradients, create_graph = True)[0], then the output is 0.6.
but, shouldn’t it is 6? the derivative of f2 w.r.t. x is the constant 6, no matter what the value of x is. Could you give me some instruction please?

If I understand correctly, in your case, you backpropagate in the graph corresponding to f2, with an initial gradient of `0.1` instead of `1`. Since the gradient computation formulate for `f2` is actually a multiplicative function of the output, you get `0.6` instead of `6`. It is the output of f3 that is going to be a constant, not the one of `f2`.

1 Like