Hi, I am trying to compute Hessian matrix by calling twice autograd.grad() on a variable.
It works fine in a toy example:
a = torch.FloatTensor([1])
b = torch.FloatTensor([3])
a, b = Variable(a, requires_grad=True), Variable(b, requires_grad=True)
c = a + 3 * b**2
c = c.sum()
grad_b = torch.autograd.grad(c, b, create_graph=True)
grad2_b = torch.autograd.grad(grad_b, b, create_graph=True)
print(grad2_b)
Output:
Variable containing:
6
[torch.FloatTensor of size 1]
But here is the question, I want to compute the Hessian of a network,
so I define a function:
def calculate_hessian(loss, model):
var = model.parameters()
temp = []
grads = torch.autograd.grad(loss, var, create_graph=True)[0]
grads = torch.cat([g.view(-1) for g in grads])
for grad in grads:
grad2 = torch.autograd.grad(grad, var, create_graph=True)
temp.append(grad2)
return np.array(temp)
It returns an empty list []. Seems like the gradient of grad cannot be computed.
Any help?
g is an individual gradient, but x is a vector of weights.
Is that intended? Don’t you want to calculate individual second order gradients for each individual weight?
However, if I add x = torch.reshape(x, [-1]) line I get
> g2 = torch.autograd.grad(g, x, retain_graph=True)[0]
> RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Anyone knows what this means?
EDIT: Oh I see that you pick g2[count] afterwards to get the diagonal of the hessian, but I’m still confused why I can’t calculate a gradient of a scalar in respect to a scalar.
I’m using it to penalize growth of second order gradients, so basically the same reason why anyone would penalize first order gradients. In my case it’s to improve noise robustness.
So, it will be integrated in loss function or working as a layer of cnn? Do you have any reference paper using hessian to improve accuracy? I am also interested in classification/detection/segmentation
No, you don’t need to reshape x. Since the graph was memorized in the previous gradient calculation, you should not change x if you wanna calculate high order gradient.
I have checked with solutions of @JIANG_GUOQING and @paul_c, overall, they are slow. The biggest problem stuck with the autograd.grad() can only work on single output. Does anyone has faster solution?
@Yaroslav_Bulatov Sorry for the reply. I believe that is exactly I wanted. If possible, is there any tutorial with pytorch code or code demos that you may know? Thank you so much.
@tengerye even though Gauss-Newton is cheap to compute, the matrix will typically need too much storage to store explicitly, so you’ll additionally need some kind of structured approximation. Here’s an example of computing diagonal and KFAC approximations of Gauss-Newton for linear layers – https://github.com/cybertronai/autograd-lib#autograd_lib
Im looking to efficiently compute the hessian of my loss function with respect to my inputs (only inputs, not weights). Is this a suitable solution? Having some trouble understanding it.