Hi @ffuuugor,
Is it possible to use this GradSampleModule
on loss function which themselves contain derivatives? I’ve managed to calculate per-sample gradients using a combination of forward_pre_hook
and full_backward_hook
, however, I’ve noticed that if your loss function contains terms that are derivatives using hooks naively fails. Could opacus be a solution for this?
For clarity, this was briefly discussed in this thread: Per-sample gradient, should we design each layer differently? - #22 by AlphaBetaGamma96 and I wrote a small example snippet which highlights where it fails here: per-sample-gradient-limitation/example.py at main · AlphaBetaGamma96/per-sample-gradient-limitation · GitHub
Do you think opacus could solve this issue?
Thank you!