When using torch.nn.parallel.DistributedDataParallel
to parallelize the network on multiple GPUs, do nn.BatchNorm
become synchronized among GPUs ?
I suppose it is, because there is a broadcast_buffers
flag in DistributedDataParallel
defaulted to True
.
Do anyone has any thoughts or confirmation on this ?
The buffers in batch norm are synchronized between processes if broadcast_buffers=True
, yes. This means that all processes get a copy of the buffers on process with rank 0. If you want to use a synchronized batch norm, check out nn.SyncBatchNorm
.
3 Likes