Performance degrades with DataParallel

Hi guys, when I’m training with my model with one gpu I’m getting x% performance (accuracy).
Now I’m trying to make my model a bit bigger, I don’t have enough memory so I used 2 GPUs with dataparallel of course.
My result wasn’t good, so I took again my small model and ran it on 2 GPUs with dataparallel, the same model that got on 1 GPU x% is now getting (x-3)%, and tried to run multiple times.

Does anyone have an idea?
Notice I’m talking about DataParallel and not DistributedDataParallel


Did you double the batch size using nn.DataParallel or are you using a batch size <2x?
The batch will be chunked in dim0 and each chunk will be send to a specific GPU.
If you don’t increase the batch size, each GPU will use a smaller batch, which might decrease the performance e.g. due to noisy running estimates in batch norm layers.

I got 4GPUs so I let the (batch size)4 together with the LR3.69, but the accuracy of the model seems to get decreased. Must the LR multi the same times? And what if I used batch size with 3 times of the original but train on 4 GPUs, will this influence the performance?

Final question, what would be the best data-parallel solution regarding the model’s maintaining the same performance or even better compared with training on one GPU?
nn.DataParallel() vs DistributedDataParallel vs PyTorch Lightning Horovod

I usually recommend to use DistributedDataParallel, as it’s the faster than nn.DataParallel, and also allows you to use SyncBatchNorm layers to share the running stats. I’m not familiar with Horovod, so cannot comment on it.