Hi everyone, am I correct that there is currently no way to efficiently compute the Hessian vector product using Pearlmutter trick (from Fast Exact Multiplication by the Hessian), i.e. approximately same time complexity as two gradient computations, due to missing forward-mode AD in PyTorch? Is there any way to work around this?
I am aware that I can simply use plain AD to compute the Hessian vector product (which will be slow), and that I can use finite difference approximation (which will be inexact and numerically unstable). I’ve also seen the R-operator and L-operator defined in a similiar issue ([Adding functionality] Hessian and Fisher Information vector products) and how they replace the R-operator with two L-operator calls, however this still won’t be O(N) or am I missing something? Thanks for your input!