Differentiation of a constant with autograd.grad?

FrancoisFleuret · June 28, 2017, 3:27pm

Why is the last grad() below raising an error “RuntimeError: differentiated input is unreachable” ?

from torch import Tensor
from torch.autograd import Variable
from torch.autograd import grad

x = Variable(Tensor([5]), requires_grad = True)
print('x', x)

f = x[0] * x[0] * x[0] - 3 * x[0]
print('f = x^3 - 3x', f)

f1 = grad(f, x, create_graph = True)[0]
print('f1 = 3x^2 - 3', f1)

f2 = grad(f1, x, create_graph = True)[0]
print('f2 = 6x', f2)

f3 = grad(f2, x, create_graph = True)[0]
print('f3 = 6', f3)

f4 = grad(f3, x, create_graph = True)[0]
print('f4 = 0', f4)

albanD · June 28, 2017, 4:00pm

This is because the graph associated to f3 as you stated outputs a constant. This means that it is independent of the input x.
The call to grad with f3 thus fails to find x in the graph of f3.
When you call grad, the second argument (here x) is the input for which you want to gradient for. In your case, this input is not in the graph, so you cannot get the gradients wrt to it.

FrancoisFleuret · June 28, 2017, 4:07pm

I understand this. My “why” was more along the line of “how comes this is the chosen behavior of .grad() instead of simply returning zero when the variable does not appear in the graph?”

albanD · June 28, 2017, 4:14pm

I would guess that in the pure NN case, having the output independent of the considered input, it is most likely a bug.
Though I agree that in a more general sense it makes sense to return zeros.

@apaszke may have a better explanation?

FrancoisFleuret · June 28, 2017, 6:52pm

Even for NN. I could imagine the same piece of code being fed a network or a sub-part of a network and wanting to use 2nd order derivatives in all cases, for instance in a penalty term, and not wanting to care about the said penalty being zero or not for some of the parameters.

huanghuang · September 16, 2017, 1:44am

Hi, albanD, Could I ask a question please? Based on the FrancoisFleuret’s example, I defined gradients = torch.FloatTensor([0.1]), and f3 = grad(f2, x, gradients, create_graph = True)[0], then the output is 0.6.
but, shouldn’t it is 6? the derivative of f2 w.r.t. x is the constant 6, no matter what the value of x is. Could you give me some instruction please?

albanD · September 18, 2017, 9:17am

If I understand correctly, in your case, you backpropagate in the graph corresponding to f2, with an initial gradient of 0.1 instead of 1. Since the gradient computation formulate for f2 is actually a multiplicative function of the output, you get 0.6 instead of 6. It is the output of f3 that is going to be a constant, not the one of f2.