Compare two cases. If there’s no reduction (self.loss_reduction == “sum”), then we want to add noise calibrated to the clipping norm X. Indeed, let the gradients be g_1,…,g_B. The sum is g_1 + … + g_B, and to make it private the additive noise is sampled from the Gaussian distribution N(0, sigma^2 * C^2) so that it masks presence or absence of any one gradient vector.
If the reduction function is mean, then the output is (g_1 + … + g_B) / B. What should the additive noise be in this case? I think it’s pretty obvious that it is must be a scaled down (by a factor of B) noise from before. The only difference between the two cases is the scaling parameter, and it should be applied equally both to the sensitive inputs and the noise. Right?