I find torch.autograd.grad working inconsistently. I will explain through two examples.

First Example
In this example I compute until third order derivatives of y=x^2 where x is a scalar valued tensor with value 3.0. The code is listed below.

x = torch.tensor(3., requires_grad = True)
y = x**2
dy_dx = torch.autograd.grad(outputs = (y, ), inputs = (x, ), create_graph = True)
print(f’dy_dx at x={x.data}:\t\t{dy_dx.data}’)
d2y_dx2 = torch.autograd.grad(outputs = dy_dx, inputs = (x, ), create_graph = True)
print(f’d2y_dx2 at x={x.data}:\t{d2y_dx2.data}’)
d3y_dx3 = torch.autograd.grad(outputs = d2y_dx2, inputs = (x, ))
print(f’d3y_dx3 at x={x.data}:\t{d3y_dx3.data}’)

I get the expected results as shown below:

dy_dx at x=3.0: 6.0
d2y_dx2 at x=3.0: 2.0
d3y_dx3 at x=3.0: 0.0

Further the requires_grad attribute of dy_dx, and d2y_dx2 are automatically True.

Second Example
In this example I tried to compute upto 2nd order derivatives of u=2*z where z=x+y and, x and y are 2d tensors of size 2x2. The first order derivative is computed correctly but I get an error “One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior” while computing second order derivative. Further, unlike in the first example, the requires_grad attribute of the first order derivative is not set to True. I’m pasting the code and the runtime behaviour of the code below.

x = torch.tensor([[1., 2.], [3., 4.]], requires_grad = True)
y = torch.tensor([[5., 6.], [7., 8.]], requires_grad = True)
z = x+y
u = 2*z

grads_1 = torch.autograd.grad(outputs = (u, ), inputs = (z, ), grad_outputs = (torch.ones(u.size()), ), create_graph = True)

for i in range(len(grads_1)):

Runtime Behavior

du_dz: tensor([[2., 2.],
[2., 2.]])
Requires grad attribute of du_dz: False

RuntimeError Traceback (most recent call last)
in ()
12