Currently I’m using the following scheme to calculate second order derivatives, using C++ API:
#include <torch/torch.h>
#include <iostream>
int main() {
auto X = torch::ones({2,}, torch::requires_grad(true)); // pseudo input
auto Y = torch::dot(X,X); // pseudo model
std::cout << "Output" << std::endl << Y << std::endl;
Y.backward(c10::nullopt, true, true);
auto Xgrad = X.grad().clone(); // holds my first deriv
std::cout << "First derivatives" << std::endl << Xgrad << std::endl;
for (int j=0; j<2; ++j) {
X.grad().zero_();
Xgrad[j].backward(c10::nullopt, true, false);
std::cout << "Second derivatives [" << j << "]" << std::endl << X.grad() << std::endl;
}
X.grad().detach_(); // without this memory will leak if you loop over the code
}
In principle it works fine, but it is not very efficient for higher input dimension. I know this is partly inherent to Autograd, but is there something that I can do better while using current C++ API? Especially when I’m only interested in diagonal elements of the Hessian?
I hope someone can give me some hint. Thanks!