Hi! I am trying to get the Hessian but since torch.autograd.grad only takes in scalar values, my implementation is very inefficient.
Basically what I’m doing is:
input = Variable(torch.randn(1,5), requires_grad = True)
target = Variable(torch.zeros(1))
criterion = nn.NLLLoss()
pred = model(input) #let's say model is just an MLP that outputs a logsoftmax
nll = criterion(pred, target)
input_grad = torch.autograd.grad(nll, input, create_graph=True)
#autograd only works with a scalar so i gotta call grad on input_grad N times, where N is the dimensionality of the input.
hessian = [torch.autograd.grad(input_grad[i], input, create_graph=True) for i in range(input.size(1))]
#now this is what I wanna do, but it doesn't work because input_grad is not a scalar.
hessian = torch.autograd.grad(input_grad, input, create_graph=True)
Is there a better way to obtain the Hessian in a more efficient manner?
On a related note, is there an example of calculating the diagonal approximation to the Hessian, along the lines of (http://yann.lecun.com/exdb/publis/pdf/becker-lecun-89.pdf)?