I’m looking at an implementation for calculating the Hessian matrix of the loss function.
loss = self.loss_function() loss.backward(retain_graph=True) grad_params = torch.autograd.grad(loss, p, create_graph=True) # p is the weight matrix for a particular layer hess_params = torch.zeros_like(grad_params) for i in range(grad_params.size(0)): for j in range(grad_params.size(1)): hess_params[i, j] = torch.autograd.grad(grad_params[i][j], p, retain_graph=True)[i, j]
I had 3 questions:
- Why do we compute hessian in a loop?Can’t we use something in the lines of
hess_params = torch.autograd.grad(grad_params, p, retain_graph=True)
Current setting takes hours to run when it comes to larger weight matrices. What can I do to enhance the code?
I have seen a
hessianfunction has been implemented in
autogradpackage. How can we use that in this case?
Pointing to reading resources and similar questions would also be highly appreciated.