Trouble with torch.autograd.grad

I find torch.autograd.grad working inconsistently. I will explain through two examples.

First Example
In this example I compute until third order derivatives of y=x^2 where x is a scalar valued tensor with value 3.0. The code is listed below.

x = torch.tensor(3., requires_grad = True)
y = x**2
dy_dx = torch.autograd.grad(outputs = (y, ), inputs = (x, ), create_graph = True)
print(f’dy_dx at x={x.data}:\t\t{dy_dx[0].data}‘)
d2y_dx2 = torch.autograd.grad(outputs = dy_dx, inputs = (x, ), create_graph = True)
print(f’d2y_dx2 at x={x.data}:\t{d2y_dx2[0].data}’)
d3y_dx3 = torch.autograd.grad(outputs = d2y_dx2, inputs = (x, ))
print(f’d3y_dx3 at x={x.data}:\t{d3y_dx3[0].data}')

I get the expected results as shown below:

dy_dx at x=3.0: 6.0
d2y_dx2 at x=3.0: 2.0
d3y_dx3 at x=3.0: 0.0

Further the requires_grad attribute of dy_dx, and d2y_dx2 are automatically True.

Second Example
In this example I tried to compute upto 2nd order derivatives of u=2*z where z=x+y and, x and y are 2d tensors of size 2x2. The first order derivative is computed correctly but I get an error “One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior” while computing second order derivative. Further, unlike in the first example, the requires_grad attribute of the first order derivative is not set to True. I’m pasting the code and the runtime behaviour of the code below.

x = torch.tensor([[1., 2.], [3., 4.]], requires_grad = True)
y = torch.tensor([[5., 6.], [7., 8.]], requires_grad = True)
z = x+y
u = 2*z

grads_1 = torch.autograd.grad(outputs = (u, ), inputs = (z, ), grad_outputs = (torch.ones(u.size()), ), create_graph = True)
print(f’du_dz: {grads_1[0]}‘)
print(f’Requires grad attribute of du_dz: {grads_1[0].requires_grad}’)

for i in range(len(grads_1)):
grads_1[i].requires_grad_(requires_grad = True)

grads2 = torch.autograd.grad(outputs = grads_1, inputs = (z, ), grad_outputs = (torch.ones(u.size()), ))
print(f’du2_dz2: {grads_2[0]}')

Runtime Behavior

du_dz: tensor([[2., 2.],
[2., 2.]])
Requires grad attribute of du_dz: False

RuntimeError Traceback (most recent call last)
in ()
11 grads_1[i].requires_grad_(requires_grad = True)
12
—> 13 grads2 = torch.autograd.grad(outputs = grads_1, inputs = (z, ), grad_outputs = (torch.ones(u.size()), ))
14 print(f’du2_dz2: {grads_2[0]}')

/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
147 return Variable._execution_engine.run_backward(
148 outputs, grad_outputs, retain_graph, create_graph,
→ 149 inputs, allow_unused)
150
151

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

I seek clarification.