Implementing Batchnorm in Pytorch. Problem with updating self.running_mean and self.running_var

I additionally found out that I get no problem using only single GPU, but for above situation, I’m using DataParallel with 2 GPUs. According to DataParallel example (https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html), half of inputs goes to cuda:0 and the other goes to cuda:1.

How can I adjust implemented bathnorm with DataParallel?