Get per-sample gradient without clipping and noise adding

Given a training process I need, at each step, to get all gradients tensors associated to each individual sample of the batch; then I need to perform some operation on each of this gradients, finally collect them together and perform the .optimizer.step().
I found that Opacus could fit my problem, but:

  • I don’t need nor the gradient clipping neither the noise adding

  • on the other hand I need to access the set of gradients associated to each batch’s element and perform some operation on them; after these operations I’ll get a single gradient that will be used for the weight update (.optimizer.step()).
    Is it possible to do this with Opacus library?

Hi @Torcione
Yes, you can use opacus for that; Take a look at GradSampleModule (opacus/grad_sample/

It’s a wrapper around nn.Module that encapsulates per sample gradient computation. When you wrap your model with GradSampleModule, each trainable parameter will get .grad_sample attribute containing per-sample gradients. No noise or clipping is performed.

Hope this helps

1 Like

Hi @ffuuugor,

Is it possible to use this GradSampleModule on loss function which themselves contain derivatives? I’ve managed to calculate per-sample gradients using a combination of forward_pre_hook and full_backward_hook, however, I’ve noticed that if your loss function contains terms that are derivatives using hooks naively fails. Could opacus be a solution for this?

For clarity, this was briefly discussed in this thread: Per-sample gradient, should we design each layer differently? - #22 by AlphaBetaGamma96 and I wrote a small example snippet which highlights where it fails here: per-sample-gradient-limitation/ at main · AlphaBetaGamma96/per-sample-gradient-limitation · GitHub

Do you think opacus could solve this issue?

Thank you! :slight_smile:

I’ll take a closer look at the code a bit later, but my first hunch would be that opacus won’t make much of a difference. We also use hooks (although it’s regular backward hooks, not full_backward_hook, that are being deprecated in the newer PyTorch version), so I don’t see a reason why we won’t face the same problem in opacus

Hi, I’m interesting this topic.How to get per-sample gradient.Can you show some examples?

Hi @chensquan, Differential Privacy Series Part 2 | Efficient Per-Sample Gradient Computation in Opacus | by PyTorch | PyTorch | Medium explains this with an example. You can also check out this guide.