I’ve just made a relatively short example that illustrates my point where this method fails if your loss function depends on the derivative of other terms. I’ve made an example script here on Github and I think this explains my current issue with hooks for per-sample gradients at the moment.
Could this issue be a potential bug of using hooks for such derivatives or is it a limitation when using the batch-supported Laplacian trick I reference in my previous response?
Thank you for all your help! It’s greatly appreciated!