We recommend to use DistributedDataParallel over nn.DataParallel as the latter relies on python threading, which is slow due to the GIL.
Regarding comparisons to PyTorch lightning, lightning offers DDP as a plugin and calls into DDP under the hood, so the performance should be comparable. I’m not aware of any performance comparisions between DDP and Horovod, unfortunately.
A couple of papers actually have comparisons between PT, Horovod, and other frameworks: AWS Herring paper and a Ray blog post which does a similar comparison.