I have a neural network $f: \mathbb{R}^n \times \mathbb{R}^c \to \mathbb{R}^m$ and in each layer, I have $W_1 b + W_2$, where $b \in \mathbb{R}^c$ is some parameter. I would like to compute a derivative (jacobian) with respect to $b$ (e.g. $d/db f(x)$).

When I try to use autograd.grad (e.g. autograd.grad(output which is $\mathbb{R}^n$, $b$ which is $\mathbb{R}^c$) it says, “grad can be implicitly created only for scalar output”. Are there any methods to achieve this? Thank you.