Forward Mode AD with multiple tangents for each primal

Mathieu_Simon · January 3, 2025, 9:41am

Hello,

I am currently exploring ways to compute Jacobian-vector products (JVP) in PyTorch. I recently came across the following resources:

However, I am still relatively new to auto-differentiation and find it challenging to fully understand how forward-mode AD operates and integrates with the forward pass.

Questions:

Compared to computing VJP (vector-Jacobian product) with grad in the backward pass, how does forward-mode AD differ in terms of:
- Speed of computation?
- Memory usage?
My main goal is to compute JVPs efficiently for multiple tangents for each primal i.e, each primal is associated with multiple tangents. Is there a way to compute JVPs for multiple tangents in parallel without having to recompute f(primal) for each tangent vector?

Any insights or examples would be greatly appreciated!

Thank you!

soulitzer · January 3, 2025, 4:10pm

Speed of computation should be similar. Memory usage for forward AD is lower since you no longer need to save a bunch of activations for backward.
You likely want to do vectorized jvps. This is possible to do using torch.func by combining torch.func.vmap — PyTorch 2.5 documentation with torch.func.jvp — PyTorch 2.5 documentation.

torchjunkie · March 7, 2025, 8:01pm

So if I have a function f(x,y) of 2 variables, if I want the partials f_x and f_y I will effectively need to make 2 calls with tangents (1,0) and (0,1)? Is it possible to create higher dimensional duals so that x-> (x, epsilon1,epsilon2) and same for y such that if I set z=f(x,y) the partials are found in z[1] and z[2]? one could get all sensitivities in 1 shot?

soulitzer · March 7, 2025, 9:39pm

Yes, you can create a Batched dual with one-hot values and use functorch’s vmap + jvp to compute that in one forward pass.