Compute Hessian-vector product for multiple vector in grad_outputs

Thanks in advance for your help and suggestions!

I’m wondering if there’s a way to compute the HVP where we may have multiple vectors to be right-multiplied by the Hessian, using a single call to torch.autograd.grad?

Concretely, suppose we are given a gradient vector grad w.r.t parameter x, we can compute the HVP by

torch.autograd.grad(grad, x, grad_outputs=v)

which will yield Hv, where H is the Hessian w.r.t x.

Suppose that, instead of a single v, we are given a sequence of tensors (v1, v2, ... , vm) such that each vi.size() == grad.size(), is there an efficient way to compute (Hv1, Hv2, ... , Hvm) efficiently, in a single call to torch.autograd.grad analogous to the computation of Hv?

A naive way seems to be feeding each vi's into a single call to torch.autograd.grad, but I’m curious to hear if there’s a more efficient implementation. :slight_smile:

I’m sorry but I’ve got no time to test the following idea, that I think it could work:

  • Concatenate the model parameters n times (i.e. repeat)
  • v = torch.cat((v0, v1, …, vn)).view(1, -1)
  • Use this: https://github.com/LeviViana/torchessian/blob/master/torchessian/__init__.py#L6-L27

Even if it works, I’m not sure though that it would provide any performance gains.