How does Dataparallel handels batch norm?

In particular,

  1. Is it that each GPU separately computes its own parameters for batch norm over minibatch allocated to it? or do they communicate with each other for computing those parameters?
  2. If GPUs are independently computing these parameters, then how do GPUs combine these parameters, say, during inference or evaluation mode, and when do they do it?

Can someone please reply?

  1. Yes, each GPU separately computes its own parameters
  2. They don’t, in inference the parameters that the master gpu accumulated are used
    This implementation of synchronized batch norm may be of interest if you indeed want the gpus to accumulate parameters between their respective mini-batches

PyTorch compatible Synchronized Cross-GPU encoding.nn.BatchNorm2d and the example.

@zhanghang1989, would you be able to update links to the synchronized batch norm implementation as they don’t work anymore? Thanks!