Regular BatchNorm triggers buffer broadcast?

Are regular BatchNorm buffers (running_mean, running_var I guess) also broadcast and synchronized during forward pass when using DistributedDataParallel?

I thought that only SyncBatchNorm does this.

Also, will this broadcast/sync also happen in eval mode? (for SyncBatchNorm)

How is it controlled which buffers are synchronized and which are not?

Hi @vadimkantorov I think only SyncBatchNorm will broadcast and sync when using DDP, but they are syncing mean/invstd/count across GPUs instead of syncing buffers, the buffers maintained locally and updated locally. You can refer to the detailed implementaion in pytorch/ at 251686fc4cb1962944ed99c938df2d54f3d62e46 · pytorch/pytorch · GitHub

For SyncBatchNormed, broadcast/sync will not happen in eval mode, only in train mode.

I’m asking because we had this problem recently when a model without SyncBatchNorm was getting deadlocked during boradcasting of buffers.

The reason of the deadlock was different: Checkpointing may cause the NCCL error · Issue #1166 · speechbrain/speechbrain · GitHub, but it was still strange that it still went into the code path of broadcasting the buffers. In debugging the buffers were the stats of regular BatchNorm1d.