BN weight is exactly a same value when use mp+DistributedDataParallel

I already find the reason.
When I use single gpu to train my code, the bn weight is normal.

But when I use torch.multiprocessing and torch.nn.parallel.DistributedDataParallel for 4 gpus, it’s wired that all BN weight in a layer is the same value.

I use pytorch1.2.0, ubuntu16,cuda10.

I use multiprocessing and DistributedDataParallel in my code like https://github.com/pytorch/examples/blob/master/imagenet/main.py.

Someone can help me??