About Synchronize Batch Norm across Multi-GPU Implementation

ycszen · July 23, 2017, 1:40pm

I want to implement synchronize batch norm across multi-GPU. How can I do it? I think I should synchronize its mean and variance both forward and backward pass, so can I use the register_hook ? Can someone give me some advise? Thank you.

ajbrock · July 23, 2017, 1:45pm

I’m not sure on the implementation, but for what it’s worth it’s actually recommended NOT to try and synchronize batchnorm for multi-node training. Not to deter you from exploring it, but unless you’re specifically investigating it I’d say it’s not the best place to devote effort for performance improvements.

uapatira · July 23, 2017, 10:21pm

Hmmm that’s weird. Isn’t that the exact opposite of DeepLab v3 and PSPNet, where their secret-sauce is essentially the fine-tuning of BN parameters on the VOC dataset across multiple GPUs?

ycszen · July 24, 2017, 4:29am

Yes, the PSPNet use this BN. I am re-implementing the network and train process. So I have to implement this BN.

HANG_ZHANG · August 9, 2017, 9:32pm

A brief description of implementing synchronize BN:

ycszen · August 10, 2017, 3:39am

Thank you. I just think about how to communicate across GPUs.

HANG_ZHANG · August 10, 2017, 6:16pm

Basically, you need to customize the DataParallel and enable the cross GPU communication for each layer. Good luck.

HANG_ZHANG · November 9, 2017, 2:38pm

Just to update, I have released the Synchronized Mulit-GPU Batch Normalization http://hangzh.com/PyTorch-Encoding/syncbn.html

meetshah1995 · December 23, 2017, 7:28am

Any updates on a Synchronized Batch Norm in Pytorch?

ChainerMN has implemented one here

zhanghang1989 · April 13, 2018, 6:49am

Please checkout the PyTorch compatible Synchronized Cross-GPU encoding.nn.BatchNorm2d and the example.