Multi GPU training on single node with DistributedDataParallel

Hi,

I’m new to distributed training.

When I train with DistributedDataParallel do I get the functionality of DataParallel, meaning can I assume that on a single node if there is more than one GPU then all GPUs will be utilized on that node?

Thanks,
Zlapp

Yep, DistributedDataParallel (DDP) can utilize multiple GPUs on the same node, but it works differently than DataParallel (DP). DDP uses multiple processes, one process per GPU, while DP is single-process multi-thread.

See this page for the comparison between the two: https://pytorch.org/tutorials/beginner/dist_overview.html#data-parallel-training

and this to get started with DDP: https://pytorch.org/docs/stable/notes/ddp.html

Great, thanks for the answer and references