Why the gradient values seems to be reversed in Tensor.backward()

I am trying to print the gradient values of 3 tensors but I find the printed gradients do not match my manual calculation, specifically the gradient of a is swapped with the gradient of c. Check the code below.

import torch
a = torch.tensor(3.0,requires_grad=True)
b = a*2
c = b ** 2

b.retain_grad()
c.retain_grad()
c.backward() # Computes the gradient of current tensor wrt graph leaves.
print(a.grad)
print(b.grad)
print(c.grad)

Here is the output:

tensor(24.)
tensor(12.)
tensor(1.)

These results seem expected to me.

c = (a*2)**2 = 4*a**2
dc/da = 8a = 8(3) = 24

So you’ve calculated dc/da which is the gradient value for c, and I agree that 24 is the correct answer. Though, when I print c.grad the output is 1 and confusingly the print of a.grad is 24. Do you see where I am confused?

Yes, that would be strange.
I see though from your original post that the prints are correct?

They don’t seem to be correct as in they’re not matching the manual calculation me and you did. Here is what I am getting:
print(a.grad) → tensor(24.)
print(b.grad) → tensor(12.)
print(c.grad) → tensor(1.)

So you’ve calculated dc/da which is the gradient value for c,

a.grad actually means dc/da (and c.grad means dc/dc, which is 1)

When using .backward() we are doing backprop not forwardprop (which PyTorch also supports Forward-mode Automatic Differentiation (Beta) — PyTorch Tutorials 2.5.0+cu124 documentation)

That makes sense and resolves the confusion!

a.grad actually means dc/da (and c.grad means dc/dc, which is 1)

Any where in the documentation that mentions this information? I never came across it before.

Not sure it is mentioned explicitly, possibly because c.retains_grad(); c.backward() is relatively rare to do (e.g. loss.grad is never populated) so usually it is harder to confuse.

The common case is more like: I have a loss and many parameter, and its obvious that the gradient of the loss wrt each of params is stored in each of param’s grad fields.

Probably. I’ve wrote the above code just to understand how the gradients are calculated in .

This was a helpful discussion, I appreciate your insights!