Let say I have a function f(x,y; theta) with two d-dimensional input variables x and y, along with the model parameters theta.

Is it possible to use torch.autograd to efficiently compute the diagonal of partial_x partial_y f(x,y; theta)?

From the previously thread [link], I can easily get the diagonal of Hessian by doing two backward() operation, but that seems to only hold for the f(x; theta) case (i.e. one input variable, but not two).

Any thoughts? very appreciated.