How does batch normalization work with multiple GPUs

weedwind · November 25, 2017, 12:01am

I am going to use 2 GPUs to do data parallel training, and the model has batch normalization. I am wondering how pytorch handle BN with 2 GPUs. Does each GPU estimate the mean and variance separately? Suppose at test time, I will only use one GPU, then which mean and variance will pytorch use?

Mata_Fu · October 1, 2018, 12:04pm

Do you have the answer right now?

Andrew_Paint · August 27, 2019, 10:14am

According to the document of PyTorch, the batch norm performs over mini-batch, namely, per GPU

liyz15 · August 27, 2019, 1:39pm

See https://pytorch.org/docs/stable/nn.html#torch.nn.SyncBatchNorm, with DistributedDataParallel and SyncBatchNorm then BN can be performed on multiple GPUs.