Why do we need to pass the gradient parameter to the backward function in PyTorch?

albanD · March 26, 2021, 2:58pm

Hi,

If you consider a function f that has n_input and n_output. And a Jacobian matrix containing all its partial derivatives J_f of size (n_output x n_input).
Then what backpropagation (or AD whichever way you want to name it) does is to compute v^T J_f for a given v.
If your function has a single output, then it makes sense to take v = 1. so that backprop will return J_f.
But if you have multiple outputs, there is no good default and so we requirethe user to provide the v value they want.

If you want to reconstruct the full J_f, you will have to do as many backwards as there are outputs in your function. You can use the autograd.functional.jacobian function if you need that.