What's the most efficient way to get the Hessian of the loss with respect to all parameters for all input samples?

Hi All,

I was wondering what’s the most efficient way of getting the Hessian of the loss with respect to all parameters for all samples? I’ve done something like this for the Jacobian of my network and I did it via using hooks and just manually defining the gradients as the terms from the hooks (within an einsum op).

Any help would be greatly appreicated!

Thank you!