I find torch.autograd.grad working inconsistently. I will explain through two examples.

**First Example**

In this example I compute until third order derivatives of y=x^2 where x is a scalar valued tensor with value 3.0. The code is listed below.

x = torch.tensor(3., requires_grad = True)

y = x**2

dy_dx = torch.autograd.grad(outputs = (y, ), inputs = (x, ), create_graph = True)

print(f’dy_dx at x={x.data}:\t\t{dy_dx[0].data}’)

d2y_dx2 = torch.autograd.grad(outputs = dy_dx, inputs = (x, ), create_graph = True)

print(f’d2y_dx2 at x={x.data}:\t{d2y_dx2[0].data}’)

d3y_dx3 = torch.autograd.grad(outputs = d2y_dx2, inputs = (x, ))

print(f’d3y_dx3 at x={x.data}:\t{d3y_dx3[0].data}’)

I get the expected results as shown below:

dy_dx at x=3.0: 6.0

d2y_dx2 at x=3.0: 2.0

d3y_dx3 at x=3.0: 0.0

Further the requires_grad attribute of dy_dx, and d2y_dx2 are automatically True.

**Second Example**

In this example I tried to compute upto 2nd order derivatives of u=2*z where z=x+y and, x and y are 2d tensors of size 2x2. The first order derivative is computed correctly but I get an error “*One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior*” while computing second order derivative. Further, unlike in the first example, the requires_grad attribute of the first order derivative is not set to True. I’m pasting the code and the runtime behaviour of the code below.

x = torch.tensor([[1., 2.], [3., 4.]], requires_grad = True)

y = torch.tensor([[5., 6.], [7., 8.]], requires_grad = True)

z = x+y

u = 2*zgrads_1 = torch.autograd.grad(outputs = (u, ), inputs = (z, ), grad_outputs = (torch.ones(u.size()), ), create_graph = True)

print(f’du_dz: {grads_1[0]}’)

print(f’Requires grad attribute of du_dz: {grads_1[0].requires_grad}’)for i in range(len(grads_1)):

grads_1[i].requires_grad_(requires_grad = True)grads2 = torch.autograd.grad(outputs = grads_1, inputs = (z, ), grad_outputs = (torch.ones(u.size()), ))

print(f’du2_dz2: {grads_2[0]}’)

**Runtime Behavior**

du_dz: tensor([[2., 2.],

[2., 2.]])

Requires grad attribute of du_dz: False

RuntimeError Traceback (most recent call last)

in ()

11 grads_1[i].requires_grad_(requires_grad = True)

12

—> 13 grads2 = torch.autograd.grad(outputs = grads_1, inputs = (z, ), grad_outputs = (torch.ones(u.size()), ))

14 print(f’du2_dz2: {grads_2[0]}’)/usr/local/lib/python3.6/dist-packages/torch/autograd/

init.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)

147 return Variable._execution_engine.run_backward(

148 outputs, grad_outputs, retain_graph, create_graph,

–> 149 inputs, allow_unused)

150

151RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

I seek clarification.