I want to compute 3rd order partial derivatives of a scalar function w.r.t. an input matrix. So if the input matrix X is 5x5 (so 25 entries), I think I should end up with (5x5)x(5x5)x(5x5) entries (25^3), right?

Main question: Is this a correct and if so, the right/best/fastest/… way to do it?

Side question: I noticed that the 2nd order derivatives already contain almost exclusively very small values, and the 3rd order derivatives contain almost only zeroes. Could this be caused by underflow, and is there a way I could fix this?

It’s the log partition function of a CRF, computed with a differentiable dynamic programming algorithm with torch-struct, which I want to differentiate (3x) w.r.t. input log edge/node potentials I can give you more details but I don’t know if they’ll be relevant?

Having a bit of information on the function might help a little bit with giving me an idea as to what you’re differentiating.

I have done calculating the 2nd derivative of a determinant function (which is why I asked if it’s that). When I made a custom version of that function it also suffered from near 0 values. Which I solved by using a custom torch.autograd.Function and manually defining the derivatives (with some torch.linalg.svd as well in order to handle singular matrices).

Thanks for the suggestion :-). I’d prefer not to define the derivatives myself though, I think they could be quite complex, which is exactly why I’d like to make use of the automatic differentiation of pytorch.

The first and main question is whether repeating torch.autograd.functional.jacobian like in my post theoretically gives me the correct 3rd order derivatives?

The 2nd question, the underflow problem, I could perhaps solve by giving lambda x: torch.exp(scalar_func(x)) as argument instead of scalar_func, since scalar_func returns the log of the partition function? Or does this not make sense, for numerical stability?

I think this should give the correct results, althought you could check this with doing the Jacobian call twice and checking with torch.autograd.functional.Hessian (although Hessian doesn’t have batch-support, so you’d have to apply it iteratively)

Applying the exp of a log value can aid the stability but it generally does become function specific. For example, in my work I have the ratio of two quantities and I can decompose that into the exp of a substraction within the log-domain.