Using grad on a Jacobian

I have a function f(x;a) between two vector spaces parametrised by a. I’d like to take the derivative of the Jacobian D_x f(x;a) with respect to the parameters a, that is ∇_a (D_x f(x;a)). It’s obviously possible to code up the Jacobian by hand and call backward() on it like this (i’ve used a 1D space for simplicity):

import torch as th
from torch import nn

def sech2(y1_pre):
    return  1.0 / (th.cosh(y1_pre) * th.cosh(y1_pre)) 

# Forward Pass
M1 = nn.Parameter(th.tensor([[1.0, 2.0, 3.0]]), requires_grad=True)
x = th.tensor([1.0], requires_grad=True)
y1_pre = th.einsum('i,ij->j', x, M1)
y1 = th.tanh(th.einsum('i,ij->j', x, M1))
M2 = nn.Parameter(th.tensor([[1.0, 0.0, 1.0], [0.0, 1.0, 0.0], [1.0, 0.0, 1.0]]), requires_grad=True)
y2_pre = th.einsum('j,jj->j', y1, M2)
y2 = th.tanh(y2_pre)
M3 = nn.Parameter(th.tensor([[10.0, 10.0, 10.0]]), requires_grad=True)
y = th.einsum('j,ij->i', y2, M3)

# Backward pass
B1 = sech2(y1_pre) * M1
B2 = sech2(y2_pre) * M2
B3 = th.einsum('jj,ij->ij', B2 , B1)
jacob = th.einsum('ij,ij->i', M3 , B3)


but this seems rather clunky, especially as I’d ultimately like to be using PyTorch’s build-in network layers.

Does anybody have a more sophisticated way of doing this? I’ve tried taking backward on the grad of the output, but that doesn’t work.


You need to make sure to pass create_graph=True the first time you run autograd.grad (or backward()) to be able to backward through that.
Also you can use the autograd.functional.jacobian(..., create_graph=True) to get the full Jacobian directly.

1 Like

Fab, thanks for this. I’ve played around and found autograd.functional.jacobian(..., create_graph=True) is an awful lot more expensive in comparison to my posted method - hopefully there’ll be a solution that is both elegant and cheap soon!