Computing Hessian for loss function

Yaroslav_Bulatov · January 21, 2020, 12:07am

There’s a solution from Adam which does regular for-loop instead of nested for-loop – https://gist.github.com/apaszke/226abdf867c4e9d6698bd198f3b45fb7

A proposal to provide hessians native in PyTorch by @albanD https://github.com/pytorch/pytorch/issues/30632

You can replace k-backward calls with a single backward call by putting k things in the batch dimension, however, this also increases your memory usage by a factor of k, see this thread – Efficient computation with multiple grad_output's in autograd.grad

There’s a fundamental problem of Hessian being large. Take resnet-50 which has 25M parameters, Hessian then has 625 trillion entries. This means for large networks you have to deal with factorized approximations or consider a subset of the entries like the diagonal, which can be obtained at a similar cost to the gradient.

IE, for ReLU networks, you can get diagonal exactly using Gauss-Newton trick implemented here and for more general networks you can use Hutchison estimator through Hessian-vector products like in this colab