Torch.dist.distributedparallel vs horovod

As given in the DDP docs, DistributedDataParallel is able to use multiple machines:

DistributedDataParallel (DDP) implements data parallelism at the module level which can run across multiple machines. Applications using DDP should spawn multiple processes and create a single DDP instance per process.

I’m not familiar with horovod and don’t know what the advantages might be.

PS: please don’t tag specific users, as it might discourage others to post better answers :wink:

1 Like