Source code for gradient clipping and noise addition in Opacus

Chandan_Tankala · September 3, 2024, 6:57am

Hello, I believe Opacus has the functionality for clipping the l2-norm of per-sample gradients, then averaging the clipped gradients, and finally adding Gaussian noise to the average of per-sample gradients. Can someone provide me the source code for that?

I have tried looking on Opacus · Train PyTorch models with Differential Privacy. I am using the “privacy_engine.make_private_with_epsilon()” function which has a parameter “max_grad_norm”, but I am not sure where I can find the source code where clipping and average of gradients is computed. I believe Opacus · Train PyTorch models with Differential Privacy provides source code for how per sample gradients are computed. But again I could not find the source code for clipping of l2 norm of gradients, averaging of gradients, and noise addition.

Any help is appreciated! Thanks.

Soumya_Kundu · September 3, 2024, 9:05am

In line 301 of def make_private( : - [Optimizer is now responsible for gradient clipping and adding noise to the gradients.]

Gradient Clipping: class DPTensorFastGradientClipping
Noise Addition: class ExponentialNoise(_NoiseScheduler)
Per-Sample Gradients: class GradSampleModule(AbstractGradSampleModule)
Averaging: [expected_batch_size: batch_size used for averaging gradients. When using
Poisson sampling averaging denominator can’t be inferred from the
actual batch size. Required is loss_reduction="mean", ignored if
loss_reduction="sum"]

Chandan_Tankala · September 3, 2024, 11:30am

Thanks, that was good to see. However, if I use the make_private_with_epsilon() method in PrivacyEngine class, then does the l2 norm of the per-sample gradient get clipped? I do not know which norm gets clipped in the Gradient Clipping link you sent. Also, in this method, I did not find a parameter that accounts for averaging of gradients. Do you know how would the averaging be done if I use this method? I also do not know the noise distribution as I did not find a corresponding parameter in this method. Any thoughts on these questions? Appreciate your time.

Soumya_Kundu · September 3, 2024, 4:27pm

Yes : "Max Grad Norm: The maximum L2 norm of per-sample gradients before they are aggregated by the averaging step"

Based on the link and the text above, does it not mean that it happens batch wise?

Based on this – I think gaussian noise?

Chandan_Tankala · September 3, 2024, 11:35pm

Thanks, that is helpful. I had seen this link earlier, but I wasn’t sure if a page from the tutorials means that indeed l2-norm of the per-sample gradient is clipped and these are averaged batch-wise. I am citing the results obtained from this code in a paper and just wanted to be sure.