I noticed that when I leave grad_outputs as None in autograd.grad I seem to get back the same gradients as when I set it as a sequence of ones (just 1 x 1 in my case). But when I compare the resulting gradient tensors with ==, the results are mostly 0 but sometimes 1 although the numbers seem to be exactly the same.
What does grad_outputs actually do in autograd.grad?
When I do it:
>>> x = torch.rand(5, requires_grad=True)
>>> y = (x**2).sum()
>>> g1 = torch.autograd.grad(y,x,torch.ones(5), retain_graph=True)
>>> g2 = torch.autograd.grad(y,x,None, retain_graph=True)
tensor([ 1, 1, 1, 1, 1], dtype=torch.uint8)
Can you show your code?
In autograd.grad, if you pass
grad_output=None, it will change it into a tensor of ones of the same size than
output with the line:
(that’s here : https://github.com/pytorch/pytorch/blob/master/torch/autograd/
@alexis-jacq Your answer still doesn’t explain the use of grad_output. Can you please explain the case when the grad_output is not ones?
@Krishna_Garg While this answer is by no means comprehensive, I have seen
grad_outputs used when calculating higher-order derivative, vector products, e.g., Hessian-vector products.