Is there a faster way to compute the jacobian than autograd.functional.jacobian?

Stev1 · April 5, 2025, 6:39pm

Given a big model, a vector-valued function f:R^n → (f_1, f_2,…,f_m) with n>>m and m>1. Now I need grad(f_i) w.r.t. to ONE SINGLE input x and I need it for every f_i. I can do that very easily just by using autograd.functional.jacobian

jacobianMatrix = functional.jacobian(myFunction, x)

And from here on I can access all the needed gradients for f_i.

My question is, if there maybe is a faster method to do exactly that. I am aware of torch.func and jacrev and its vectorization capabilities but in my case where I just use one single input instance, jacrev seems a little bit slower and seemingly has a big memory footprint.

On the other hand I could use autograd.grad and just compute the jacobian^T vector product and use the unitvectors as grad_outputs to extract the gradients from the jacobian. I tested this version too and it is slower than just to compute the full jacobian.

Is there maybe another method I could try?

Why can I only compute gradients w.r.t. one instance? I don’t know the other inputs in advance, because input1 is calculated from the gradients w.r.t. input0 and therefore vectorization won’t help me in this case.

KFrank · April 6, 2025, 1:56pm

Hi Stev!

It’s not clear to me what you’re asking.

If by “ONE SINGLE input x” you mean that you have a function from R^n to R^m,
f (x_1, x_2, …, x_n) → (f_1 (x_1, x_2, …, x_n), …, f_m (x_1, …, x_2, …, x_n)), and
you want the partial derivatives with respect to just one of the x_i, say, x_7, then you
should use forward-mode automatic differention. If you create a “dual tensor” for x_7,
you will be able to compute all of the derivatives d f_i (x_j) / d x_7 using just a single
forward-mode forward pass.

(If this is not what you are asking, please clarify.)

Best.

K. Frank

Stev1 · April 6, 2025, 10:01pm

Hey Frank!

I just need d f_i(x)/dx (for all i) where x is a vector. With the “one single input” I wanted to emphasize that I don’t look for the fastest method to compute this quantity for batches but the fastest for just one input.

Best.

Stev

KFrank · April 7, 2025, 1:53am

Hi Stev!

Okay, I think I understand now.

You said that the dimensionality of your input, n, is much greater than the dimensionality
of your output, m. Therefore you want to use regular “reverse-mode” autograd, rather
than forward-mode autograd.

autograd.functional.jacobian() (with vectorize = True and the default value for
strategy of 'reverse-mode') should be the fastest approach. Note that the documentation
suggests that jacrev() is “less experimental,” but the two should be doing basically the same
thing under the hood, namely using a loop to perform a separate backward pass for each
element of the output, but with that loop “vectorized” using vmap().

Best.

K. Frank

Stev1 · April 7, 2025, 6:32pm

Hi Frank!

In my case jacrev() really makes my notebook laggy and I had several kernel crashes running it. I think that I have read somewhere that jacrev in contrast to autograd.functional.jacobian() sacrifices more memory for faster execution?

Otherwise, thank you for your help!

Best.

Stev