Is nn.DataParallel slower for pytorch 1.0 than 0.4.1

I constructed my model with pytorch 0.4.1 and train it on two gpus, I use nn.DataParallel to distribute the model to the two gpus, it worked well for me. However, when I switched to pytorch_nightly version(pip install torch_nightly), it seems that the training is getting much slower with the same code. Is there some change with this multi-gpu method please ?

same here. Updated to 1.0 yesterday and it takes 50% more time to train the same model on same data… on the bright side, my loss curves look similar, so it seems to work correctly, but slowly. I am NOT using GPUs yet, so this behavior is observed on my macbook. My Dataloaders have num_workers=6, different batch sizes don’t seem to help…