Here is the code that I am running and the corresponding output
l1 = t.nn.Linear(1, 1, bias=False) l1.weight.data[:] = 2 l2 = t.nn.Linear(1, 1, bias=False) l2.weight.data[:] = 3 inputs = t.tensor([5.], requires_grad=True) a = l2(l1(inputs)) g = t.autograd.grad(outputs=a, inputs=inputs, create_graph=True) print('inputs.grad is ', g) gp = g * 3 gp.backward() print('l1_grad is now ', l1.weight.grad) print('l2_grad is now ', l2.weight.grad)
inputs.grad is tensor([6.], grad_fn=<SqueezeBackward1>) l1_grad is now tensor([[9.]]) l2_grad is now tensor([[6.]])
Now, to my understanding, the gradients should actually be
45 for l1_grad
30 for l2_grad
however, it seems that the inputs is not taken into account when computing the gradients for l1 and l2?