How to use Distributed Pytorch on multiple Machines

(Surbhi) #1


I am trying to run Distributed pytorch on multiple machines. From the Documentation, I can see that we can define MASTER ADDRESS and its port. It is not clear that how can we define which worker nodes to be used for computation, like we can do in tensorflow using ClusterSpecs ?

-----Is there any way similar to tensorflow clusterspecs in Pytorch, where we can define the nodes to be used in computation.


(Surbhi) #2

Any suggestions --------------------- ?