Second order derivatives with C++ API

Currently I’m using the following scheme to calculate second order derivatives, using C++ API:

#include <torch/torch.h>
#include <iostream>

int main() {
    auto X = torch::ones({2,}, torch::requires_grad(true)); // pseudo input
    auto Y = torch::dot(X,X); // pseudo model
    std::cout << "Output" << std::endl << Y << std::endl;

    Y.backward(c10::nullopt, true, true);
    auto Xgrad = X.grad().clone(); // holds my first deriv
    std::cout << "First derivatives" << std::endl << Xgrad << std::endl;

    for (int j=0; j<2; ++j) {
        Xgrad[j].backward(c10::nullopt, true, false);
        std::cout << "Second derivatives [" << j << "]" << std::endl << X.grad() << std::endl;
    X.grad().detach_(); // without this memory will leak if you loop over the code

In principle it works fine, but it is not very efficient for higher input dimension. I know this is partly inherent to Autograd, but is there something that I can do better while using current C++ API? Especially when I’m only interested in diagonal elements of the Hessian?

I hope someone can give me some hint. Thanks!