DistributedDataParallel with single-process slower than sing-gpu

mrshenli · January 17, 2020, 9:30pm

DistributedDataParallel’s single-process-multi-gpu mode is not recommended, because it does parameter replication, input split, output gather, etc. in every iteration, and Python GIL might get in the way. If you just have one machine, with one process per machine, then it will be very similar to DataParallel.

The recommended solution is to use single-process-single-gpu, which means, in your use case with two GPUs, you can spawn two processes, and each process exclusively works on one GPU. This should be faster than the current setup.