Hello,
I am currently exploring ways to compute Jacobian-vector products (JVP) in PyTorch. I recently came across the following resources:
However, I am still relatively new to auto-differentiation and find it challenging to fully understand how forward-mode AD operates and integrates with the forward pass.
Questions:
-
Compared to computing VJP (vector-Jacobian product) with
grad
in the backward pass, how does forward-mode AD differ in terms of:- Speed of computation?
- Memory usage?
-
My main goal is to compute JVPs efficiently for multiple tangents for each primal i.e, each primal is associated with multiple tangents. Is there a way to compute JVPs for multiple tangents in parallel without having to recompute
f(primal)
for each tangent vector?
Any insights or examples would be greatly appreciated!
Thank you!