Hello, by trying applying autograd
for performing reverse-mode autodiff on multi-dimensional input arrays I found this kind of behaviour:
a = torch.tensor([[1., 2., 3.], [4., 5., 6.]], requires_grad=True)
b = torch.tensor([10., 20., 30.], requires_grad=True)
q = 3*a**3 - b**2
q.backward(torch.ones_like(a))
print(torch.all(a.grad == 9*a**2).item(), torch.all(b.grad == -2*b).item())
>>> True False
Indeed the value of b.grad
is -4b
, as if the gradients were accumulated twice. However this is not the case as b.register_post_accumulate_grad_hook(lambda p: print(f"b accumulating grad of {p}: {p.grad=}"))
shows that the gradients were only accumulated once.
On the other hand, the results are different if b
is already broadcasted to the right shape:
a = torch.tensor([[1., 2., 3.], [4., 5., 6.]], requires_grad=True)
b = torch.tensor([[10., 20., 30.], [10., 20., 30.]], requires_grad=True)
q = 3*a**3 - b**2
q.backward(torch.ones_like(a))
print(torch.all(a.grad == 9*a**2).item(), torch.all(b.grad == -2*b).item())
>>> True True
So what exactly happens here? I guess the different behaviour is due to the way in which the Jacobian-vector product is implemented in autograd
. Could you provide some details about the implementation when using arrays with more than 1 dimension (like matrices and multi-dimensional tensors) which I could not find on the online documentation?
Thank you very much.
P.S. I run the code with PyTorch build version 2.5.1+cu124