DistributedDataParallel hurts the model performance

I am wondering is it reasonable that using multi-gpus parallel hurts the model’s performance? (I am using DistributedDataParallel, btw)

For single gpu, I use the batch size 32, and to test on 2 gpus, I set batch size equals to 16 for each gpu, so the overall batch size will be 32 which is same to the single gpu. (other hyper parameters stay the same) The performance of the model drops a lot when using 2 gpus. I am wondering is there any possible reason why it happens?

Thanks!

Your model’s performance might depend on the batch size e.g. if it’s using batchnorm layers.
Since you are using a smaller batch size on each rank now, the performance could change, and you could check SyncBatchNorm if that’s indeed the root cause.