Hi, Opacus is a bit of a black box to me, so it would be great if you could clarify a few things about its internals to me.
First in the DCGAN example
After privatizing the dataloader, Discriminator D, and D’s Optimizer, each training iteration poisson samples ‘real’ with rate q and looks like this
# (1) train on generated data
fake_loss = criterion(D(fake))
# (2) train on real (private) data
real_loss = criterion(D(real))
When I vary the number of times I do (1), the privacy cost increases, although it shouldn’t because training D on fake data has no dependence on the private real data. It should be a free operation by post-processing.
An alternative approach is to do the following, in each iteration
loss = criterion(D(fake)) + criterion(D(real))
Now I am curious about the semantics of what happens here. Are we clipping all the gradients (assume to 1), summing them up, adding noise with sigma = noise_multiplier, and then finally dividing everything by expected batch size (qN)? If so I believe this approach satisfies DP without overcharging privacy.
Come to think of it, I think you’re right here - (1) should be free from the privacy standpoint and we indeed overcharge privacy budget.
Speaking of the solutions, my first instinct is to have two separate optimizers for the discriminator. Both will be covering the same set of parameters, but one would be private (DPOptimizer), the second - regular.
The proposal you’re describing could work as well, assuming you actually want to clip gradients for the fake data - you don’t have to from the privacy perspective, but I’m not sure what’s the best approach for the model quality.
I’ve created a github issue to track this: #418. If you want, feel free to send a PR with your approach. Alternatively, someone else will pick it up later