What does grad_outputs do in autograd.grad?

Kimmo_Ojala · May 13, 2018, 10:17am

I noticed that when I leave grad_outputs as None in autograd.grad I seem to get back the same gradients as when I set it as a sequence of ones (just 1 x 1 in my case). But when I compare the resulting gradient tensors with ==, the results are mostly 0 but sometimes 1 although the numbers seem to be exactly the same.

What does grad_outputs actually do in autograd.grad?

alexis-jacq · May 13, 2018, 9:20pm

When I do it:

>>> x = torch.rand(5, requires_grad=True)
>>> y = (x**2).sum()
>>> g1 = torch.autograd.grad(y,x,torch.ones(5), retain_graph=True)
>>> g2 = torch.autograd.grad(y,x,None, retain_graph=True)
>>> g1[0]==g2[0]
tensor([ 1,  1,  1,  1,  1], dtype=torch.uint8)

Can you show your code?

In autograd.grad, if you pass grad_output=None, it will change it into a tensor of ones of the same size than output with the line:

new_grads.append(torch.ones_like(out))

(that’s here : https://github.com/pytorch/pytorch/blob/master/torch/autograd/__init__.py)

Krishna_Garg · July 18, 2020, 10:38pm

@alexis-jacq Your answer still doesn’t explain the use of grad_output. Can you please explain the case when the grad_output is not ones?

Zayd · February 1, 2021, 12:44pm

@Krishna_Garg While this answer is by no means comprehensive, I have seen grad_outputs used when calculating higher-order derivative, vector products, e.g., Hessian-vector products.