I have a quick question regarding how to implement the Laplacian of the output of a network efficiently. I’ve managed to implement a method for calculating it, however, I’m pretty sure the way I’m doing it is a pretty inefficient way.

What I mean by the Laplacian of the output of the network is, let’s say I have a simple feed-forward network, y = model(x) and I wish to calculate Sum_{i} d2y_dx_{i}2 for all inputs samples.

My current code for calculating the Laplacian for an N-dimensional input x is

y = model(x) #where model is an R^N to R^1 function
laplacian = torch.zeros(x.shape[0]) #array to store values of laplacian
for i, xi in enumerate(x):
hess = torch.autograd.functional.hessian(model, xi.unsqueeze(0), create_graph=True)
laplacian[i] = torch.diagonal(hess.view(N, N) offset=0).sum()

I’ve noticed for large input samples (which I need) the memory usage increases drastically. I would assume this is because I create a graph for each call of torch.autograd.functional.hessian. I need this because I need to differentiate the Laplacian with respect to the network parameters in my loss function.

My question is, would it be possible to perhaps reduce the memory usage on calculating the hessian or perhaps re-use the same graph but just for different inputs? (if that’s the correct way of phrasing it?)

The whole memory usage form the graph comes from intermediary results. So re-using the graph wouldn’t help.

I’m afraid that extracting the diagonal of the Hessian cannot be done more efficiently in general.
You could reduce the memory usage by doing a custom function that drops the non-diagonal elements earlier but that won’t be a huge change.

You might want to check packages like this one https://backpack.pt/ that provides approximate method to get the diagonal elements. If you are happy with an approximate value for your use case, it will be the fastest you can do!

Thanks for the quick response! On implementing a custom function, I did have a look at creating custom autograd functions which have forward and backward methods. I was wondering if it were at all possible to implement a backward of backward method in an autograd function?

What I mean by backward of backward would be an analytical expression for the 2nd order gradient of the loss function with respect to the output of the final layer. Because the final layer of my network is a determinant function, explicitly writing out those derivatives and calling them directly would be far more efficient than calling autograd and letting it solve it itself.