Is it always recommended to use SYNC BN every time you use DDP? Is there any exception?
Usually SYNC BN is used because for large training runs the batch size per GPU is pretty small and you can’t gather enough statistics independently for batch normalization. If you have a large enough batch size per GPU, you might be able to get away without SYNC BN.
Although, a lot of this depends on your model and you should try with and without SYNC BN to see if your model converges fine in both cases. If your model converges fine without SYNC BN, I’d recommend avoiding SYNC BN since there is a perf overhead while running with SYNC BN due to synchronization among processes.