Compute the Hessian matrix of a network

scidom · December 8, 2019, 10:06am

As previously mentioned in the thread, computing the Hessian via autograd is expected to be slow. The main idea is to call grad() on grad(). The snippet mentioned above makes two nested calls of the Jacobian, which is conceptually less clear when you first encounter the question of how to compute the Hessian. The main thing to avoid is to call backward() and then grad() on the result, instead of calling grad() twice in a nested fashion, since the former approach leads to memory leakage.

Here are two methods of a model, that can be used for trying to understand how to compute the Hessian:

github.com

papamarkou/eeyore/blob/master/eeyore/api/model.py#L148




    x, y, _ = next(iter(dataloader))


    log_lik_val = self.log_lik(x, y)


    if self.constraint is not None:
        self.set_params(theta)


    return log_lik_val + log_prior_val


def grad_log_target(self, log_target_val):
    grad_log_target_val = grad(log_target_val, self.parameters(), create_graph=True)
    grad_log_target_val = torch.cat([g.view(-1) for g in grad_log_target_val])
    return grad_log_target_val


def upto_grad_log_target(self, theta, dataloader):
    log_target_val = self.log_target(theta, dataloader)
    grad_log_target_val = self.grad_log_target(log_target_val)
    return log_target_val, grad_log_target_val


def hess_log_target(self, grad_log_target_val):

github.com

papamarkou/eeyore/blob/master/eeyore/api/model.py#L158


def grad_log_target(self, log_target_val):
    grad_log_target_val = grad(log_target_val, self.parameters(), create_graph=True)
    grad_log_target_val = torch.cat([g.view(-1) for g in grad_log_target_val])
    return grad_log_target_val


def upto_grad_log_target(self, theta, dataloader):
    log_target_val = self.log_target(theta, dataloader)
    grad_log_target_val = self.grad_log_target(log_target_val)
    return log_target_val, grad_log_target_val


def hess_log_target(self, grad_log_target_val):
    n_params = self.num_params()


    hess_log_target_val = []
    for i in range(n_params):
        deriv_i_wrt_grad = grad(grad_log_target_val[i], self.parameters(), retain_graph=True)
        hess_log_target_val.append(torch.cat([h.view(-1) for h in deriv_i_wrt_grad]))
    hess_log_target_val = torch.cat(hess_log_target_val, 0).reshape(n_params, n_params)


    return hess_log_target_val

The grad_log_target_val input argument to hess_log_target() is the output of grad_log_target().

If you want a standalone script without involving a model, you may check my jupyter notebook below, where I explain step by step via a walk-through example how to compute the Hessian:

github.com

papamarkou/eeyore/blob/master/examples/checks/autograd/hessian.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "from torch.autograd import grad"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Comparison between analytical and autograd gradient\n",
    "\n",
    "$$\n",

This file has been truncated. show original

iacob · March 31, 2021, 10:50am

Use PyTorch’s autograd.functional library:

torch.autograd.functional.hessian(func, inputs)

tengerye · April 26, 2021, 7:50am

@iacob Hi, would you please share us an example please?