For example, I used his blog to try to get the 2nd derivative [Second order derivatives and inplace gradient “zeroing” ], but it turns out that the grd.grad information is None. Can anyone give me some suggestions?

import torch
from torch import Tensor
from torch.autograd import Variable
from torch.autograd import grad
from torch import nn

# some toy data
x = Variable(Tensor([4., 2.]), requires_grad=False)
y = Variable(Tensor([1.]), requires_grad=False)

# linear model and squared difference loss
model = nn.Linear(2, 1)
loss = torch.sum((y - model(x))**2)

Thanks Tom, I got the grad, but it is not correct. Like the following example, i want to get the second derivative of (2x)^2 at x0=0.5153, the final result could return the 1st order derivative correctly which is 8*x0=4.12221, but for the second derivative, it is not the expected 8, do you know why?

import torch
from torch import Tensor
from torch.autograd import Variable
from torch.autograd import grad
from torch import nn

torch.manual_seed(1)
x = Variable(Tensor([2.]), requires_grad=False)

model = nn.Linear(1, 1, bias=False)

x0 = [par.data for par in model.parameters()][0]
print(x0)

loss = torch.sum(model(x)**2)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_grads = grad(loss, model.parameters(), create_graph=True)
gn2 = sum([grd.norm()**2 for grd in loss_grads]) / 2 # 2nd derive
print(‘loss %f grad norm %f’ % (loss.data, gn2.data))

for grd in loss_grads:
grd = grd.retain_grad()

model.zero_grad()
gn2.backward(retain_graph=True)

for grd in loss_grads:
print 8 * x0, grd.data[0], grd.grad

Gotcha, that’s why the answers are the same, Thank you so much. Do you know how to calculate the second derivative of (x1)^2 + (2*x2)^2 with respect to x1 and x2, which should be (2, 8)?

I must admit that I’m confused about how the linear layer fits into what you want to achieve.
If you drop the nn.Linear and start with x as requires_grad = True, you get the 2nd derivative in x.grad…

Hi, Tom, sorry for not explaining explicitly on my question.

My ultimate question is that if I got a neural network loss function, can I get the 2nd derivative of the likelihood function with respect to every weight? It doesn’t have to be a hessian matrix, but just the diagonal of it.

Do you know if we can do it based on the current version?

I don’t think that this is currently possible in geberal (unless iterating over the scalar parameters). The fundamental reason is that backpropagation in PyTorch isn’t prepared take deruvatives of vector-valued functions, so you are limited to taking the derivative of a scalar sum of derivatives, i.o.w. a Hessian-Vector product.
For a small number of parameters using torch.autograd.grad will help, but I’m not sure it scales to all parameters of large nets.

is there a way to get the full Hessian matrix with w.r.s to the input. calling the backward() function two times only provides me with a diagonal of Hessian matrix but not the full one. I need something like
tf.Hessian().