How does Dataparallel handels batch norm?

amitmore17 · February 26, 2018, 10:10am

In particular,

Is it that each GPU separately computes its own parameters for batch norm over minibatch allocated to it? or do they communicate with each other for computing those parameters?
If GPUs are independently computing these parameters, then how do GPUs combine these parameters, say, during inference or evaluation mode, and when do they do it?

amitmore17 · March 1, 2018, 5:40am

Can someone please reply?

ngimel · March 1, 2018, 6:55am

Yes, each GPU separately computes its own parameters
They don’t, in inference the parameters that the master gpu accumulated are used
This implementation of synchronized batch norm may be of interest if you indeed want the gpus to accumulate parameters between their respective mini-batches http://hangzh.com/PyTorch-Encoding/syncbn.html

zhanghang1989 · April 13, 2018, 6:52am

PyTorch compatible Synchronized Cross-GPU encoding.nn.BatchNorm2d and the example.

jpcenteno · September 5, 2018, 2:51am

@zhanghang1989, would you be able to update links to the synchronized batch norm implementation as they don’t work anymore? Thanks!

zhanghang1989 · September 5, 2018, 2:31pm