What is the key difference between torch.dist.distributedparallel and horovod?
If my understanding is correct, torch.dist.distributedparallel work on single node with one or more GPUs (it does not distribute workloads across GPUs across more than one node) whereas horovod can work with multi-node multi-gpu.
If my understanding is not correct, kindly explain when to use horovod and when to use torch.dist.distributedparallel?
Kindly share your thoughts? Thank you very much in advance!!