How to compute the finite difference Jacobian matrix

Dear community,
I need to compute the (differentiable) Jacobian matrix of y = f(z), where y is a (B, 3, 128, 128) aka a batch of images, and z is a (B, 64) vector.

Computing the Jacobian matrix through the pytorch functionial jacobian (Automatic differentiation package - torch.autograd — PyTorch 1.7.1 documentation) Is too slow. Thus, I am exploring the finite difference method (Finite difference - Wikipedia), which is an approximation of the Jacobian. My implementation for B=1 is:

def get_jacobian(net, z, x):
    eps = torch.rand((x.size(0), ), device=x.device)
    delta = torch.sqrt(eps)

    x = x.view(x.size(0), -1)
    m = x.size(1)
    n = z.size(1)

    J = torch.zeros((m, n), device=x.device)
    I = torch.eye(n, device=x.device)

    for j in range(n):
        J[:, j] = (net(z+delta*I[:, j]).view(x.size(0), -1) - x) / delta

    return J

x_fake = mynetfunction(z)
J = get_jacobian(mynetfunction, z, x_fake)

However, it requires a lot of memory. Is there a way to make it better?

1 Like

Hi,

I am not sure you are generating your eps correctly: rand is a uniform in [0, 1], not a normal.
Also the value in the formula is for a variance close to 0.

We actually use this to check our gradients here pytorch/gradcheck.py at master · pytorch/pytorch · GitHub
The main things are:

  • Always use double precision otherwise the precision is just really bad
  • Use delta=1e-6 for double precision. No need to actually do sampling
  • You can do centered difference if you need more precision: f(z + e_t * eps/2) - f(z - e_t * eps/2). Even even more precise with an additional point at the center. You can check the wikipedia page for the exact formula you need to use in this case.
  • Don’t create the full I matrix. Just one vector and write a 1 at the proper index into it.
  • Disable autograd if you’re using nn.Module to avoid extra allocation with @torch.no_grad()