Multi GPU training on single node with DistributedDataParallel

zlapp · August 12, 2020, 12:24pm

Hi,

I’m new to distributed training.

When I train with DistributedDataParallel do I get the functionality of DataParallel, meaning can I assume that on a single node if there is more than one GPU then all GPUs will be utilized on that node?

Thanks,
Zlapp

mrshenli · August 12, 2020, 2:02pm

Yep, DistributedDataParallel (DDP) can utilize multiple GPUs on the same node, but it works differently than DataParallel (DP). DDP uses multiple processes, one process per GPU, while DP is single-process multi-thread.

See this page for the comparison between the two: https://pytorch.org/tutorials/beginner/dist_overview.html#data-parallel-training

and this to get started with DDP: https://pytorch.org/docs/stable/notes/ddp.html

zlapp · August 12, 2020, 2:10pm

Great, thanks for the answer and references