Here is the code that I am running and the corresponding output
l1 = t.nn.Linear(1, 1, bias=False)
l1.weight.data[:] = 2
l2 = t.nn.Linear(1, 1, bias=False)
l2.weight.data[:] = 3
inputs = t.tensor([5.], requires_grad=True)
a = l2(l1(inputs))
g = t.autograd.grad(outputs=a, inputs=inputs, create_graph=True)[0]
print('inputs.grad is ', g)
gp = g * 3
gp.backward()
print('l1_grad is now ', l1.weight.grad)
print('l2_grad is now ', l2.weight.grad)
inputs.grad is tensor([6.], grad_fn=<SqueezeBackward1>)
l1_grad is now tensor([[9.]])
l2_grad is now tensor([[6.]])
Now, to my understanding, the gradients should actually be
45 for l1_grad
30 for l2_grad
however, it seems that the inputs is not taken into account when computing the gradients for l1 and l2?