# Compute the Hessian matrix of a network

Hi, I am trying to compute Hessian matrix by calling twice `autograd.grad()` on a variable.
It works fine in a toy example:

``````a = torch.FloatTensor()
b = torch.FloatTensor()

c = a + 3 * b**2
c = c.sum()

``````

Output:

``````Variable containing:
6
[torch.FloatTensor of size 1]
``````

But here is the question, I want to compute the Hessian of a network,
so I define a function:

``````def calculate_hessian(loss, model):
var = model.parameters()
temp = []

return np.array(temp)
``````

It returns an empty list `[]`. Seems like the gradient of grad cannot be computed.
Any help?

Thanks,

4 Likes

Waiting for the same solution.

1 Like

I use follow code to evaluate Hessian matrix:

``````# eval Hessian matrix
cnt = 0
g_vector = g.contiguous().view(-1) if cnt == 0 else torch.cat([g_vector, g.contiguous().view(-1)])
cnt = 1
l = g_vector.size(0)
hessian = torch.zeros(l, l)
for idx in range(l):
cnt = 0
g2 = g.contiguous().view(-1) if cnt == 0 else torch.cat([g2, g.contiguous().view(-1)])
cnt = 1
hessian[idx] = g2
return hessian.cpu().data.numpy()
``````

where loss_grad can calculate like: `autograd.grad(loss, net.parameters(), create_graph=True)`

Note: it’s only for small network

1 Like

Hi @JIANG_GUOQING , looking at your code, in the line

`g2 = torch.autograd.grad(g, x, retain_graph=True)`

`g` is an individual gradient, but `x` is a vector of weights.
Is that intended? Don’t you want to calculate individual second order gradients for each individual weight?

However, if I add `x = torch.reshape(x, [-1])` line I get

``````>     g2 = torch.autograd.grad(g, x, retain_graph=True)
> RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
``````

Anyone knows what this means?

EDIT: Oh I see that you pick g2[count] afterwards to get the diagonal of the hessian, but I’m still confused why I can’t calculate a gradient of a scalar in respect to a scalar.

Interesting! Anyone can answer the purpose of hessian in cnn. Thanks

I’m using it to penalize growth of second order gradients, so basically the same reason why anyone would penalize first order gradients. In my case it’s to improve noise robustness.

3 Likes

So, it will be integrated in loss function or working as a layer of cnn? Do you have any reference paper using hessian to improve accuracy? I am also interested in classification/detection/segmentation

What do you mean “working as a layer”?

However it’s extremely slow, and I’m about to give up trying to implement it efficiently.

1 Like

No, you don’t need to reshape x. Since the graph was memorized in the previous gradient calculation, you should not change x if you wanna calculate high order gradient.

Hope it makes sense to you.

Did you use this code during training? I guess not, because switching to numpy would break autodiff…

Yes, you are right. But actually I turned on the training mode because I needed to calculate the high order of ‘Gradient’.

Michael Klachko via PyTorch Forums noreply@discuss.pytorch.org 于2019年3月28日周四 上午3:29写道：

I think currently it’s only possible to calculate Hessian w.r.t input x not the parameters of the network.

1 Like

I have checked with solutions of @JIANG_GUOQING and @paul_c, overall, they are slow. The biggest problem stuck with the `autograd.grad()` can only work on single output. Does anyone has faster solution?

Hi,

Unfortunately, these operations are expected to be relatively slow. You can find here a simple way to get Jacobian and Hessian matrices.