I already find the reason.
When I use single gpu to train my code, the bn weight is normal.
But when I use torch.multiprocessing and torch.nn.parallel.DistributedDataParallel for 4 gpus, it’s wired that all BN weight in a layer is the same value.
I use pytorch1.2.0, ubuntu16,cuda10.
I use multiprocessing and DistributedDataParallel in my code like https://github.com/pytorch/examples/blob/master/imagenet/main.py.
Someone can help me??