In DDP broadcast_buffers is set to true as default. I am thinking if this is necessary? I used SyncBatchNorm and according to implementation I think during backward every process will get gradients from all nodes so statistics should always be consistent.
So far my validation accuracy is not normal similar to https://github.com/facebookresearch/maskrcnn-benchmark/issues/267 . I am thinking to replace SyncBatchNorm with BatchNorm. So if broadcast_buffers is set to True I guess all nodes will use statistics from first node right?