Hi, I am trying to compute Hessian matrix by calling twice autograd.grad() on a variable.
It works fine in a toy example:
a = torch.FloatTensor()
b = torch.FloatTensor()
a, b = Variable(a, requires_grad=True), Variable(b, requires_grad=True)
c = a + 3 * b**2
c = c.sum()
grad_b = torch.autograd.grad(c, b, create_graph=True)
grad2_b = torch.autograd.grad(grad_b, b, create_graph=True)
[torch.FloatTensor of size 1]
But here is the question, I want to compute the Hessian of a network,
so I define a function:
def calculate_hessian(loss, model):
var = model.parameters()
temp = 
grads = torch.autograd.grad(loss, var, create_graph=True)
grads = torch.cat([g.view(-1) for g in grads])
for grad in grads:
grad2 = torch.autograd.grad(grad, var, create_graph=True)
It returns an empty list . Seems like the gradient of grad cannot be computed.
g is an individual gradient, but x is a vector of weights.
Is that intended? Don’t you want to calculate individual second order gradients for each individual weight?
However, if I add x = torch.reshape(x, [-1]) line I get
> g2 = torch.autograd.grad(g, x, retain_graph=True)
> RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Anyone knows what this means?
EDIT: Oh I see that you pick g2[count] afterwards to get the diagonal of the hessian, but I’m still confused why I can’t calculate a gradient of a scalar in respect to a scalar.
So, it will be integrated in loss function or working as a layer of cnn? Do you have any reference paper using hessian to improve accuracy? I am also interested in classification/detection/segmentation
@tengerye even though Gauss-Newton is cheap to compute, the matrix will typically need too much storage to store explicitly, so you’ll additionally need some kind of structured approximation. Here’s an example of computing diagonal and KFAC approximations of Gauss-Newton for linear layers – https://github.com/cybertronai/autograd-lib#autograd_lib