What are the differences of these code sniplets for calculating autograds

I have been trying to undertand autograd operation given in different forms from several repos while i investigate a specific topic:

u = net(x,t) 
u_t = torch.autograd.grad(u.sum(), t, create_graph=True)[0]


u = net(x,t) 
u_t = torch.autograd.grad(u, t,  grad_outputs=torch.ones_like(u), create_graph=True)[0]


u = net(x,t) 
u_t = torch.autograd.grad(u.sum(), t,  grad_outputs=torch.ones_like(u), create_graph=True)[0]

Unfortunately it is not clear for me. I have two questions about this:

  1. What are the main differences between them?
  2. What is the correct way of calculating the gradients?
    Thank you

The backward of .sum() is “.expand()”.
The first variant (btw. I tend to write it u_t, = torch.autograd.grad(...), i.e. use tuple unpacking instead of indexing) is implicitly furnishing a scalar 1 as the grad_outputs.
Thus the difference is that 1. has a first step does an expand of the 1 as the backward of .sum() while 2. doesn’t have this step. After this, the two backward passes are exactly identical.

The third variant looks like an error to me unless u is scalar already because there will be a shape mismatch between the first argument and the grad_outputs.

Best regards


P.S.: This is a great question, I’ll re-use it as an exercise for my autograd course if you allow.

1 Like

Thank you for excellent explanation. Now everything is clear.
P.S.: Of course. you can use it wherever you want. it is my pleasure :slight_smile:
Kind regards.

1 Like