Virtual batch sizing with multi GPU training

Hi there,

I was wondering if virtual batch sizing (as explained, for example, in Opacus · Train PyTorch models with Differential Privacy) is supposed to work with multi-GPU training? For me, virtual batch sizing work well (even if model is wrapped in DifferentiallyPrivateDistributedDataParallel) when using a single GPU but fails when using multiple GPUs. The first couple iterations work fine but at some point the self.pre_step(): in DistributedDPOptimizer returns True for one rank while returning False for the other rank. The code then gets stuck because the first optimizer takes a “real step” while the second doesn’t.

EDIT: After further investigation it seems that the problem is the following. Due to Poisson sampling, the BatchSplittingSampler may split a batch into n physical batches, where n is not divisible by the number of GPUs used.

Hmmm. That does sound like a bug. If not solve, Opacus should, at the very least, warn the users about this. Thanks for flagging @timudk , could you please open an issue on github? Sign in to GitHub · GitHub

1 Like