Question on Autograd -- Compute Jacobian, non scalar output

I have a neural network $f: \mathbb{R}^n \times \mathbb{R}^c \to \mathbb{R}^m$ and in each layer, I have $W_1 b + W_2$, where $b \in \mathbb{R}^c$ is some parameter. I would like to compute a derivative (jacobian) with respect to $b$ (e.g. $d/db f(x)$).

When I try to use autograd.grad (e.g. autograd.grad(output which is $\mathbb{R}^n$, $b$ which is $\mathbb{R}^c$) it says, “grad can be implicitly created only for scalar output”. Are there any methods to achieve this? Thank you.


Yes this is a limitation of AD that it only does “vector times Jacobian” product. If you want to get the full jacobian you will need to use special code.
Note that if you use nightly builds of pytorch, we added this here.