Hi,

I’m training my model on 2-gpu system with CUDA 11.0 & PyTorch 1.7.1. Using DDP, I see that on a single GPU model loss is converging faster. But when use both GPUs slow convergence.

What could be the problem? Tried initializing the random seed, same results.

1-GPU Epochs with DDP

```
===> Epoch 0 Complete: Avg. Loss: 0.36975980444256995
===> Epoch 1 Complete: Avg. Loss: 0.32454686222479784
===> Epoch 2 Complete: Avg. Loss: 0.3071180762120964
===> Epoch 3 Complete: Avg. Loss: 0.2750444378917671
===> Epoch 4 Complete: Avg. Loss: 0.24473399923287129
===> Epoch 5 Complete: Avg. Loss: 0.21892599486872508
===> Epoch 6 Complete: Avg. Loss: 0.20032298285795483
===> Epoch 7 Complete: Avg. Loss: 0.1875871649501547
===> Epoch 8 Complete: Avg. Loss: 0.17785485737093265
===> Epoch 9 Complete: Avg. Loss: 0.17077650971643155
```

2-GPU Epochs with DDP:

===> Epoch 0 Complete: Avg. Loss: 0.4675159709281232

```
> ===> Epoch 1 Complete: Avg. Loss: 0.33464150040982715
> ===> Epoch 2 Complete: Avg. Loss: 0.330990740333695
> ===> Epoch 3 Complete: Avg. Loss: 0.3278805889997138
> ===> Epoch 4 Complete: Avg. Loss: 0.3254638583545225
> ===> Epoch 5 Complete: Avg. Loss: 0.3231941443609904
> ===> Epoch 6 Complete: Avg. Loss: 0.3184018903468029
> ===> Epoch 7 Complete: Avg. Loss: 0.3123437018997698
> ===> Epoch 8 Complete: Avg. Loss: 0.30351564180420104
> ===> Epoch 9 Complete: Avg. Loss: 0.29410275745104597
```

Regards,

MJay