Data-parallel solution comparisons. Which would be the data-parallel solution? nn.DataParallel() vs DistributedDataParallel vs PyTorch Lightning Horovod vs any other

What would be the best data-parallel solution regarding the model’s maintaining the same performance or even better compared with training on one GPU?

nn.DataParallel() vs DistributedDataParallel vs PyTorch Lightning Horovod vs any other available methods

We recommend to use DistributedDataParallel over nn.DataParallel as the latter relies on python threading, which is slow due to the GIL.

Regarding comparisons to PyTorch lightning, lightning offers DDP as a plugin and calls into DDP under the hood, so the performance should be comparable. I’m not aware of any performance comparisions between DDP and Horovod, unfortunately.

A couple of papers actually have comparisons between PT, Horovod, and other frameworks: AWS Herring paper and a Ray blog post which does a similar comparison.

1 Like