Do nn.BatchNorm in distributed training default to be synchronized?

When using torch.nn.parallel.DistributedDataParallel to parallelize the network on multiple GPUs, do nn.BatchNorm become synchronized among GPUs ?
I suppose it is, because there is a broadcast_buffers flag in DistributedDataParallel defaulted to True.
Do anyone has any thoughts or confirmation on this ?

The buffers in batch norm are synchronized between processes if broadcast_buffers=True, yes. This means that all processes get a copy of the buffers on process with rank 0. If you want to use a synchronized batch norm, check out nn.SyncBatchNorm.

3 Likes