Layer_norm sync during distributed training

Ardeal · December 2, 2021, 3:52am

Hi,

my understanding is:
batchnorm could be synced during distribued training by API torch.nn.SyncBatchNorm.convert_sync_batchnorm().
Could layer_norm be synced during distributed training by API torch.nn.SyncBatchNorm.convert_sync_batchnorm()?

model_sync = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)

tom · December 2, 2021, 6:57am

The reason to sync BatchNorm is because it collects statistics across samples (i.e. elements of a minibatch) which will be on different GPUs.
LayerNorm does not merge statistics between elements of a minibatch but only computes statistics within a sample, which will be on a given GPU. So there is nothing to sync.

Best regards

Thomas

Ardeal · December 2, 2021, 7:11am

@tom ,
LayerNorm does not collect statistics, so we don’t need to sync LayerNorm, right?

if we sync LayerNorm, what will happen?

tom · December 2, 2021, 7:14am

Sorry, the initial answer was off, so I edited it. Thank you for pointing out that the first wasn’t a working explanation.

Ardeal · December 2, 2021, 7:48am

@tom ,
Thank you!
we don’t need to sync LayerNorm on different GPUs.