How could I use sync-bn correctly?


Do I need to convert each batchnorm layer to its sync-bn version or Do I only need to convert the model?
I mean:

class Model(nn.Module):
def __init__(self):
     self.conv = nn.Conv2d(...) = nn.SyncBatchNorm.convert_sync_batchnorm(nn.BatchNorm2d(...))
model = Model()

Or this:

class Model(nn.Module):
def __init__(self):
     self.conv = nn.Conv2d(...) = nn.BatchNorm2d(...)
     model = nn.SyncBatchNorm.convert_sync_batchnorm(Model())

Which one is the correct usage ? Could I simply call the convert function on the whole model ?

The docs show an example, where it’s called on the complete module, which should work.
Are you seeing any issues in this approach?

Yes, I found that the training becomes quite slow and the converging time gets longer. As for the final results, the second method is worse than the first method in my experiments.

I have figured out my problem, it has nothing to do with the way of using convert_sync_bn. The solution is that if I use apex, I should use convert_sync_bn before initializing the amp. There will be problem if I use convert_sync_bn after it.