I have a function f(x;a) between two vector spaces parametrised by a. I’d like to take the derivative of the Jacobian D_x f(x;a) with respect to the parameters a, that is ∇_a (D_x f(x;a)). It’s obviously possible to code up the Jacobian by hand and call `backward()`

on it like this (i’ve used a 1D space for simplicity):

```
import torch as th
from torch import nn
def sech2(y1_pre):
return 1.0 / (th.cosh(y1_pre) * th.cosh(y1_pre))
# Forward Pass
M1 = nn.Parameter(th.tensor([[1.0, 2.0, 3.0]]), requires_grad=True)
x = th.tensor([1.0], requires_grad=True)
y1_pre = th.einsum('i,ij->j', x, M1)
y1 = th.tanh(th.einsum('i,ij->j', x, M1))
M2 = nn.Parameter(th.tensor([[1.0, 0.0, 1.0], [0.0, 1.0, 0.0], [1.0, 0.0, 1.0]]), requires_grad=True)
y2_pre = th.einsum('j,jj->j', y1, M2)
y2 = th.tanh(y2_pre)
M3 = nn.Parameter(th.tensor([[10.0, 10.0, 10.0]]), requires_grad=True)
y = th.einsum('j,ij->i', y2, M3)
# Backward pass
B1 = sech2(y1_pre) * M1
B2 = sech2(y2_pre) * M2
B3 = th.einsum('jj,ij->ij', B2 , B1)
jacob = th.einsum('ij,ij->i', M3 , B3)
jacob.backward()
```

but this seems rather clunky, especially as I’d ultimately like to be using PyTorch’s build-in network layers.

Does anybody have a more sophisticated way of doing this? I’ve tried taking backward on the grad of the output, but that doesn’t work.