I am wondering is it reasonable that using multi-gpus parallel hurts the model’s performance? (I am using DistributedDataParallel, btw)
For single gpu, I use the batch size 32, and to test on 2 gpus, I set batch size equals to 16 for each gpu, so the overall batch size will be 32 which is same to the single gpu. (other hyper parameters stay the same) The performance of the model drops a lot when using 2 gpus. I am wondering is there any possible reason why it happens?
Thanks!