How could I use sync-bn correctly?

coincheung · April 16, 2020, 2:59am

Hi,

Do I need to convert each batchnorm layer to its sync-bn version or Do I only need to convert the model?
I mean:

class Model(nn.Module):
def __init__(self):
     self.conv = nn.Conv2d(...)
     self.bn = nn.SyncBatchNorm.convert_sync_batchnorm(nn.BatchNorm2d(...))
...
model = Model()

Or this:

class Model(nn.Module):
def __init__(self):
     self.conv = nn.Conv2d(...)
     self.bn = nn.BatchNorm2d(...)
...
     model = nn.SyncBatchNorm.convert_sync_batchnorm(Model())

Which one is the correct usage ? Could I simply call the convert function on the whole model ?

ptrblck · April 16, 2020, 4:43am

The docs show an example, where it’s called on the complete module, which should work.
Are you seeing any issues in this approach?

coincheung · April 16, 2020, 5:47am

Yes, I found that the training becomes quite slow and the converging time gets longer. As for the final results, the second method is worse than the first method in my experiments.

coincheung · April 20, 2020, 3:40am

I have figured out my problem, it has nothing to do with the way of using convert_sync_bn. The solution is that if I use apex, I should use convert_sync_bn before initializing the amp. There will be problem if I use convert_sync_bn after it.