What is the difference between DataParallel and DistributedDataParallel?

Dawny33 · August 11, 2017, 12:40pm

I am going through this imagenet example: https://github.com/pytorch/examples/blob/master/imagenet/main.py

And, in line 88, the module DistributedDataParallel is used. When I searched for the same in the docs, I haven’t found anything. Possible to redirect me to it if any such doc exist for the module.

Else, would like to know what is the difference between the DataParallel and DistributedDataParallel modules.

fmassa · August 11, 2017, 7:57pm

DataParallel is for performing training on multiple GPUs, single machine.
DistributedDataParallel is useful when you want to use multiple machines.

csarofeen · May 4, 2018, 4:54pm

Sorry for resurrecting this old thread. The answer above made some confusion with some folks I’ve talked to.

Distributed Data Parallel can very much be advantageous perf wise for single node multi-gpu runs. When run in a 1 gpu / process configuration Distributed Data Parallel can be beneficial as CPU based overheads are now spread across multiple processes.

Perf gains will especially be prominent in networks that have many small layers/operations. I primarily recommend to folks that they use single gpu / process Distributed Data Parallel over Data Parallel even for single node cases if they want to scale past 2 GPUs.

Toru · May 18, 2018, 2:37pm

Can you please elaborate on “When run in a 1 gpu / process configuration Distributed Data Parallel can be beneficial as CPU based overheads are now spread across multiple processes”? Thanks!

Hjjiang · April 1, 2021, 6:39am

Totally agree with you！
“I primarily recommend to folks that they use single gpu / process Distributed Data Parallel over Data Parallel even for single node cases if they want to scale past 2 GPUs.”