Get per-sample gradient without clipping and noise adding

Hi @ffuuugor,

Is it possible to use this GradSampleModule on loss function which themselves contain derivatives? I’ve managed to calculate per-sample gradients using a combination of forward_pre_hook and full_backward_hook, however, I’ve noticed that if your loss function contains terms that are derivatives using hooks naively fails. Could opacus be a solution for this?

For clarity, this was briefly discussed in this thread: Per-sample gradient, should we design each layer differently? - #22 by AlphaBetaGamma96 and I wrote a small example snippet which highlights where it fails here: per-sample-gradient-limitation/example.py at main · AlphaBetaGamma96/per-sample-gradient-limitation · GitHub

Do you think opacus could solve this issue?

Thank you! :slight_smile: