my understanding is:
batchnorm could be synced during distribued training by API torch.nn.SyncBatchNorm.convert_sync_batchnorm().
Could layer_norm be synced during distributed training by API torch.nn.SyncBatchNorm.convert_sync_batchnorm()?
The reason to sync BatchNorm is because it collects statistics across samples (i.e. elements of a minibatch) which will be on different GPUs.
LayerNorm does not merge statistics between elements of a minibatch but only computes statistics within a sample, which will be on a given GPU. So there is nothing to sync.