Higher-order gradients w.r.t. different functions

Hi,

thanks for elaborating on your use case.
When you say large gradients, do you have a specific norm in mind? I think you need to specify that to go multi-dimensional (if x isn’t scalar, I think that the update rule in your original runs into trouble with dimensionality.)
I’m not entirely sure what is the best way to “isolate” the two backwards passes in full generality. Does the function f factor into “application of NNs” and “compute loss”? Then detaching the outputs of the NNs might work well.

Best regards

Thomas