As given in the DDP docs, DistributedDataParallel
is able to use multiple machines:
DistributedDataParallel (DDP) implements data parallelism at the module level which can run across multiple machines. Applications using DDP should spawn multiple processes and create a single DDP instance per process.
I’m not familiar with horovod
and don’t know what the advantages might be.
PS: please don’t tag specific users, as it might discourage others to post better answers