Is there any way to get second order derivative or Hessian Matrix?

I am trying to get the Hessian Matrix of weights in a convolutional kernel. However, there is no API which can do the job like Tensorflow.


TensorFlow will give you the diagonal of the hessian, not the full hessian (if i am not confused).
In current version of PyTorch there is no way to do this, but we will have this feature in version 0.2, the next major release.

AFAIK TensorFlow will return you a Hessian-vector product like most automatic differentiation software.

Any updates regarding second order derivatives in PyTorch?


The autograd branch, which will be merged soon, supports repeated application of .backward (or, more conveniently, a new autograd.differentiate operator) and can compute the exact Hessian-vector product.


Is autograd compatible with the master branch right now? Does autograd.differentiate support taking the gradient of a high order function of gradient? I dug around but couldn’t find the roadmap of next release.

Has there been any update on this? That is, how to get the Hessian (even if just the diagonal) in Pytorch?

Would something like this work:

output = model.forward(input)
hess = input.grad

if you’re using the master branch it’s something along these lines:

Thanks, will look into that!

It looks like torch.autograd does not have a “grad” function?

I’m using the latest pytorch (0.1.12_2)

ImportError Traceback (most recent call last)
in ()
1 import torch
2 import torch.nn as nn
----> 3 from torch.autograd import Variable, grad
4 import torchvision.transforms as transforms

ImportError: cannot import name grad

It’s present in the master branch and not in 0.1.12.
You have to build pytorch from source. Instructions are here:

Got it, thanks!

I installed Pytorch from source and now I have access to the grad and backward functions in torch.autograd.

However when I try to use grad my kernel just crashes. I use like this:

output_img = model.forward(input_img)
g = grad(output_img, input_img, create_graph=True)

input_img: 1x3x256x256
output_img: 1x3x256x256

Is this not intended to be used with non-scalar (multiple-dimensional) Tensors? The examples on the github work fine.


AFAIK all the operations have not been modified to be twice differentiable yet. I saw a couple of open PRs.
I suspect that is why its crashing.

I see, that makes sense, given that the model is a CNN with skip connections. Thanks!

Are there any updates on this? Its been some time.

Whats wrong with just doing w.grad.backward()?

From pytorch 0.2.0, you can get higher order gradient. More information can be found here:

1 Like
import torch
from torch import Tensor
from torch.autograd import Variable
from torch.autograd import grad
from torch import nn


x = Variable(torch.ones(2,1), requires_grad=True)
A = torch.FloatTensor([[1,2],[3,4]])


f = x.view(-1) @ A @ x

x_1grad, = grad(f, x, create_graph=True)
print(A @ x + A.t() @ x)

x_2grad0, = grad(x_1grad[0], x, create_graph=True)
x_2grad1, = grad(x_1grad[1], x, create_graph=True)

Hessian =, x_2grad1), dim=1)

print(A + A.t())

I’ve worked on this issue for some time, plz feel free to use it below.