I’m looking at an implementation for calculating the Hessian matrix of the loss function.
loss = self.loss_function()
loss.backward(retain_graph=True)
grad_params = torch.autograd.grad(loss, p, create_graph=True) # p is the weight matrix for a particular layer
hess_params = torch.zeros_like(grad_params[0])
for i in range(grad_params[0].size(0)):
for j in range(grad_params[0].size(1)):
hess_params[i, j] = torch.autograd.grad(grad_params[0][i][j], p, retain_graph=True)[0][i, j]
I had 3 questions:
 Why do we compute hessian in a loop?Can’t we use something in the lines of
hess_params = torch.autograd.grad(grad_params, p, retain_graph=True)

Current setting takes hours to run when it comes to larger weight matrices. What can I do to enhance the code?

I have seen a
hessian
function has been implemented inautograd
package. How can we use that in this case?
Pointing to reading resources and similar questions would also be highly appreciated.