I am having trouble understanding exactly what this line means in the docs

``````grad_outputs (sequence of Tensor) – The “vector” in the Jacobian-vector product. Usually gradients w.r.t. each output. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional. Default: None.
``````

I see this thread which partially explains it (`None` is equivalent to passing in `torch.ones(...)` of the proper size) but I still don’t really understand what it is for or what it should be used for.

Any input? Thanks

1 Like

Hi,

`None` is equivalent to passing in `torch.ones(...)` of the proper size

This is only true for an output with a single element!

Otherwise, you can see these outputs as providing `dL/dout` (where `L` is your loss) so that the autograd can compute `dL/dw` (where `w` are the parameters for which you want the gradients) as `dL/dw = dL/dout * dout/dw`.

Another way to see this as mentioned in the doc is that autograd only computes a vector matrix product between a vector v and the Jacobian of the function. `grad_outputs` allow you to specifiy this vector `v`.

Thanks for your answer, so the vector passed in will not be mutated, but it will have an effect on the final gradients that come out of the `grad` function?

Is there a simple use case to illustrate why someone would need this?

In most cases, you can do without it, but for example, you can replace:

``````loss = l1 + 2 * l2
``````

by

``````autograd.grad((l1, l2), inp, grad_outputs=(torch.ones_like(l1), 2 * torch.ones_like(l2))
``````

Which is going to be slightly faster.
Also some algorithms require you to compute `x * J` for some `x`. You can avoid having to compute the full Jacobian J by simply providing `x` as a grad_output.

3 Likes

Thanks for the help. Just one more thing. It seems that by the code you posted, passing in `torch.ones(...)` will not have a material affect on the final outcome, right? seems like that conflicts with the comment about a single element, but I am not sure

I assume above that l1 and l2 are scalar value! Sorry
I just use `ones_like()` to get a Tensor with a 1 on the right device and with the right dtype.

1 Like

this example really useful for me to understand the grad_outputs argument, I think it could be added to the document of autograd to help more people like me

Thanks for that answer, I would add that torch.ones could be seen as the derivative of the identity map, in this way the backward differentiation can be initialized. It acts as a seed in some sense !