How to calculate the 2nd derivative of the diagonal of the hessian matrix from a function?

Wei_Deng · March 18, 2018, 2:39am

For example, I used his blog to try to get the 2nd derivative [Second order derivatives and inplace gradient “zeroing” ], but it turns out that the grd.grad information is None. Can anyone give me some suggestions?

import torch
from torch import Tensor
from torch.autograd import Variable
from torch.autograd import grad
from torch import nn

# some toy data
x = Variable(Tensor([4., 2.]), requires_grad=False)
y = Variable(Tensor([1.]), requires_grad=False)

# linear model and squared difference loss
model = nn.Linear(2, 1)
loss = torch.sum((y - model(x))**2)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)

# instead of using loss.backward(), use torch.autograd.grad() to compute gradients
loss_grads = grad(loss, model.parameters(), create_graph=True)

gn2 = sum([grd.norm()**2 for grd in loss_grads]) # 2nd derive
print(‘loss %f grad norm %f’ % (loss.data, gn2.data))
model.zero_grad()
gn2.backward()
optimizer.step()

for grd in loss_grads:
print grd.grad

The output is None.

Can any one tell me how to get it?

tom · March 18, 2018, 3:38am

You can call grd.retain_grad () before backward to keep the grad of a non-leaf variable.

Best regards

Thomas

Wei_Deng · March 18, 2018, 5:20pm

Thanks Tom, I got the grad, but it is not correct. Like the following example, i want to get the second derivative of (2x)^2 at x0=0.5153, the final result could return the 1st order derivative correctly which is 8*x0=4.12221, but for the second derivative, it is not the expected 8, do you know why?

import torch
from torch import Tensor
from torch.autograd import Variable
from torch.autograd import grad
from torch import nn

torch.manual_seed(1)
x = Variable(Tensor([2.]), requires_grad=False)

model = nn.Linear(1, 1, bias=False)

x0 = [par.data for par in model.parameters()][0]
print(x0)

loss = torch.sum(model(x)**2)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_grads = grad(loss, model.parameters(), create_graph=True)
gn2 = sum([grd.norm()**2 for grd in loss_grads]) / 2 # 2nd derive
print(‘loss %f grad norm %f’ % (loss.data, gn2.data))

for grd in loss_grads:
grd = grd.retain_grad()

model.zero_grad()
gn2.backward(retain_graph=True)

for grd in loss_grads:
print 8 * x0, grd.data[0], grd.grad

tom · March 18, 2018, 9:03pm

This calculated d gn2 / d grd = d (0.5 grd^2) / d grd = grd correctly, but maybe you want something else?

Best regards

Thomas

Wei_Deng · March 19, 2018, 12:56am

Gotcha, that’s why the answers are the same, Thank you so much. Do you know how to calculate the second derivative of (x1)^2 + (2*x2)^2 with respect to x1 and x2, which should be (2, 8)?

Really appreciate your suggestions. Thanks a lot.

tom · March 19, 2018, 7:23am

I must admit that I’m confused about how the linear layer fits into what you want to achieve.
If you drop the nn.Linear and start with x as requires_grad = True, you get the 2nd derivative in x.grad…

Wei_Deng · March 19, 2018, 5:45pm

Hi, Tom, sorry for not explaining explicitly on my question.

My ultimate question is that if I got a neural network loss function, can I get the 2nd derivative of the likelihood function with respect to every weight? It doesn’t have to be a hessian matrix, but just the diagonal of it.

Do you know if we can do it based on the current version?

tom · March 19, 2018, 6:23pm

I don’t think that this is currently possible in geberal (unless iterating over the scalar parameters). The fundamental reason is that backpropagation in PyTorch isn’t prepared take deruvatives of vector-valued functions, so you are limited to taking the derivative of a scalar sum of derivatives, i.o.w. a Hessian-Vector product.
For a small number of parameters using torch.autograd.grad will help, but I’m not sure it scales to all parameters of large nets.

Best regards

Thomas

Wei_Deng · March 20, 2018, 12:59am

Got it, thanks a lot.

nima_rafiee · May 22, 2019, 8:51am

is there a way to get the full Hessian matrix with w.r.s to the input. calling the backward() function two times only provides me with a diagonal of Hessian matrix but not the full one. I need something like
tf.Hessian().