Multiprocessing for multiple gpus

I was wondering why is it not advised to use multiple GPUs using muliprocesing? As an example, towards the end you have advise “Use nn.DataParallel instead of multiprocessing”
While there is an example to use multiple GPUs using multiprocessing .
I am confused as to which is the correct rule? That seems contradictory?


I have the same question with you.

I can’t speak to the specifics of the guidelines here. In my own usage, DataParallel is the quick and easy way to get going with multiple GPUs on a single machine.

However, if you want to push the performance, I’ve found that using the NVIDIA apex implementation of DistributedDataParallel with one GPU per process and a few of their other opts better saturates the GPUs on a single machine and usually results in approx 10-15% higher throughput.

I can’t speak to how apex DDP compares to the PyTorch native impl, I switched to apex because I’m also using AMP mixed precision from time to time.

Their example is a good sample:


For curiosity’s sake, I ran a quick test on a machine that I recently bumped up to 3 pascal GPU. Previous comparison was made with 2 x RTX cards.

For imagenet style training @ 224x224, smaller model (something like a mnasnet/mobilenetv2), 8 physical core CPU:
830 img/sec avg - single training process, 3 GPU, torch.nn.DataParallel, 8 (or 9 for fairness) worker processes
1015 img/sec avg - 3 training process, 1 GPU per-proc, apex.DistributedDataParallel, 3 workers per training process

Everything else in those two runs is the same, same preprocessing, using the same ‘fast’ preload + collation routines from Nvidia’s examples. So looks like throwing in another GPU increases the impact to > 20%.