Calculating the Hessian vector product with nn.parameters()

I would like to calculate the Hessian vector product, where the Hessian is the second-derivative matrix of the loss function of some neural net, and the vector will be the vector of gradients of that loss function.

I know how to calculate the Hessian vector product for a regular function thanks to this post. However, I am running into trouble when the function is the loss function of a neural network. This is because the parameters are packaged into a module, accessible via nn.parameters(), and not a torch tensor.

I want to do something like this (doesn’t work):

a simple neural network

linear = nn.Linear(10, 20)
x = torch.randn(1, 10)
y = linear(x).sum()

compute the gradient and make a copy that is detached from the graph

grad = torch.autograd.grad(y, linear.parameters(), create_graph=True)
v = grad.clone().detach()

compute the Hessian vector product

z = grad @ v

In analogy this this (does work):

x = Variable(torch.Tensor([1, 1]), requires_grad=True)
f = 3x[0]**2 + 4x[0]*x[1] + x[1]**2
grad, = torch.autograd.grad(f, x, create_graph=True)
v = grad.clone().detach()
z = grad @ v

This post addresses a similar (possibly the same?) issue, but I don’t understand the resolution.

1 Like

@gshartnett were you able to solve this issue? I am facing a similar issue too.

I created a related post and the replies helped me figure out my problem.

Thanks! I also solved the issue similarly.