DistributedDataParallel and convert_sync_batchnorm

zzzf · September 9, 2022, 9:43am

I find most people first convert BatchNorm to SyncBatchNorm and then wrap the model with DistributedDataParallel:

model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])

If I reverse the order like the following, would I get the same results?

model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank]) 
model = nn.SyncBatchNorm.convert_sync_batchnorm(model)

ptrblck · September 9, 2022, 5:35pm

I would generally stick to the docs and their usage, but based on this code I guess you might see the same result assuming the recursive named_children() call also sees the internal .module attribute.