Is there a faster way to compute the jacobian than autograd.functional.jacobian?

Given a big model, a vector-valued function f:R^n → (f_1, f_2,…,f_m) with n>>m and m>1. Now I need grad(f_i) w.r.t. to ONE SINGLE input x and I need it for every f_i. I can do that very easily just by using autograd.functional.jacobian

jacobianMatrix = functional.jacobian(myFunction, x)

And from here on I can access all the needed gradients for f_i.

My question is, if there maybe is a faster method to do exactly that. I am aware of torch.func and jacrev and its vectorization capabilities but in my case where I just use one single input instance, jacrev seems a little bit slower and seemingly has a big memory footprint.

On the other hand I could use autograd.grad and just compute the jacobian^T vector product and use the unitvectors as grad_outputs to extract the gradients from the jacobian. I tested this version too and it is slower than just to compute the full jacobian.

Is there maybe another method I could try?

Why can I only compute gradients w.r.t. one instance? I don’t know the other inputs in advance, because input1 is calculated from the gradients w.r.t. input0 and therefore vectorization won’t help me in this case.

Hi Stev!

It’s not clear to me what you’re asking.

If by “ONE SINGLE input x” you mean that you have a function from R^n to R^m,
f (x_1, x_2, …, x_n) → (f_1 (x_1, x_2, …, x_n), …, f_m (x_1, …, x_2, …, x_n)), and
you want the partial derivatives with respect to just one of the x_i, say, x_7, then you
should use forward-mode automatic differention. If you create a “dual tensor” for x_7,
you will be able to compute all of the derivatives d f_i (x_j) / d x_7 using just a single
forward-mode forward pass.

(If this is not what you are asking, please clarify.)

Best.

K. Frank

Hey Frank!

I just need d f_i(x)/dx (for all i) where x is a vector. With the “one single input” I wanted to emphasize that I don’t look for the fastest method to compute this quantity for batches but the fastest for just one input.

Best.

Stev

Hi Stev!

Okay, I think I understand now.

You said that the dimensionality of your input, n, is much greater than the dimensionality
of your output, m. Therefore you want to use regular “reverse-mode” autograd, rather
than forward-mode autograd.

autograd.functional.jacobian() (with vectorize = True and the default value for
strategy of 'reverse-mode') should be the fastest approach. Note that the documentation
suggests that jacrev() is “less experimental,” but the two should be doing basically the same
thing under the hood, namely using a loop to perform a separate backward pass for each
element of the output, but with that loop “vectorized” using vmap().

Best.

K. Frank

1 Like

Hi Frank!

In my case jacrev() really makes my notebook laggy and I had several kernel crashes running it. I think that I have read somewhere that jacrev in contrast to autograd.functional.jacobian() sacrifices more memory for faster execution?

Otherwise, thank you for your help!

Best.

Stev