I would like to calculate the Hessian vector product, where the Hessian is the second-derivative matrix of the loss function of some neural net, and the vector will be the vector of gradients of that loss function.
I know how to calculate the Hessian vector product for a regular function thanks to this post. However, I am running into trouble when the function is the loss function of a neural network. This is because the parameters are packaged into a module, accessible via nn.parameters(), and not a torch tensor.
I want to do something like this (doesn’t work):
a simple neural network
linear = nn.Linear(10, 20)
x = torch.randn(1, 10)
y = linear(x).sum()compute the gradient and make a copy that is detached from the graph
grad = torch.autograd.grad(y, linear.parameters(), create_graph=True)
v = grad.clone().detach()compute the Hessian vector product
z = grad @ v
z.backward()
In analogy this this (does work):
x = Variable(torch.Tensor([1, 1]), requires_grad=True)
f = 3x[0]**2 + 4x[0]*x[1] + x[1]**2
grad, = torch.autograd.grad(f, x, create_graph=True)
v = grad.clone().detach()
z = grad @ v
z.backward()
This post addresses a similar (possibly the same?) issue, but I don’t understand the resolution.