Full loss Hessian spectrum estimation

For those interested in analyzing the loss landscape of modern neural nets, it is worth noting that this article has introduced a method for estimating the full loss Hessian spectrum. I’ve implemented a simple version of this method here: https://github.com/LeviViana/torchessian.

This method brings unprecedented clarity on understanding very high dimensional losses’ landscapes, and even the simple examples I’ve published in the repo weren’t feasible about a year ago.

As an example of application, you can study how batchnorm layers change the shape of the loss landscape, or how the spectrum of a single batch can approximate the global dataset landscape, or how decreasing the learning rate pushes the neural net to a point where the curvatures are not necessarily bigger, or how hardly separable data points can generate big eigenvalues on the Hessian, etc etc.

I hope you’ll have fun as much as I do when playing around with losses’ landscapes !


Thanks for sharing your code here! It looks really interesting :slight_smile:

1 Like