I would like to calculate the Hessian vector product, where the Hessian is the second-derivative matrix of the loss function of some neural net, and the vector will be the vector of gradients of that loss function.

I know how to calculate the Hessian vector product for a regular function thanks to this post. However, I am running into trouble when the function is the loss function of a neural network. This is because the parameters are packaged into a module, accessible via nn.parameters(), and not a torch tensor.

I want to do something like this (doesn’t work):

## a simple neural network

linear = nn.Linear(10, 20)

x = torch.randn(1, 10)

y = linear(x).sum()## compute the gradient and make a copy that is detached from the graph

grad = torch.autograd.grad(y, linear.parameters(), create_graph=True)

v = grad.clone().detach()## compute the Hessian vector product

z = grad @ v

z.backward()

In analogy this this (does work):

x = Variable(torch.Tensor([1, 1]), requires_grad=True)

f = 3x[0]**2 + 4x[0]*x[1] + x[1]**2

grad, = torch.autograd.grad(f, x, create_graph=True)

v = grad.clone().detach()

z = grad @ v

z.backward()

This post addresses a similar (possibly the same?) issue, but I don’t understand the resolution.