Layer_norm sync during distributed training


my understanding is:
batchnorm could be synced during distribued training by API torch.nn.SyncBatchNorm.convert_sync_batchnorm().
Could layer_norm be synced during distributed training by API torch.nn.SyncBatchNorm.convert_sync_batchnorm()?

model_sync = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)

The reason to sync BatchNorm is because it collects statistics across samples (i.e. elements of a minibatch) which will be on different GPUs.
LayerNorm does not merge statistics between elements of a minibatch but only computes statistics within a sample, which will be on a given GPU. So there is nothing to sync.

Best regards


@tom ,
LayerNorm does not collect statistics, so we don’t need to sync LayerNorm, right?

if we sync LayerNorm, what will happen?

Sorry, the initial answer was off, so I edited it. Thank you for pointing out that the first wasn’t a working explanation.

@tom ,
Thank you!
we don’t need to sync LayerNorm on different GPUs.