Why DataParallel replicate the module during forward?

I’m reading the code about DataParallel but I don’t quite understand why module replicate happens during call to forward.
I thought we should replicate module once and execute forward for multiple times? Or is this class design for training phase because the backward prop would require parameter averaging (synchronization) after each forward call?

The scatter and gather operations are used, as a “simple” data parallel implementation via nn.DataParallel. This blog post explains the overall workflow in more details.
This overhead is also why we recommend to use DistributedDataParallel with a single process per GPU, which would avoid these copies, and would thus yield the best performance.

1 Like