Thanks in advance for your help and suggestions!
I’m wondering if there’s a way to compute the HVP where we may have multiple vectors to be right-multiplied by the Hessian, using a single call to torch.autograd.grad
?
Concretely, suppose we are given a gradient vector grad
w.r.t parameter x
, we can compute the HVP by
torch.autograd.grad(grad, x, grad_outputs=v)
which will yield Hv
, where H
is the Hessian w.r.t x.
Suppose that, instead of a single v
, we are given a sequence of tensors (v1, v2, ... , vm)
such that each vi.size() == grad.size()
, is there an efficient way to compute (Hv1, Hv2, ... , Hvm)
efficiently, in a single call to torch.autograd.grad
analogous to the computation of Hv
?
A naive way seems to be feeding each vi
's into a single call to torch.autograd.grad
, but I’m curious to hear if there’s a more efficient implementation.