Is it possible to assign different numbers of workers across hosts?


I have two hosts where one has two GPUs and another one has only one GPU. I am wondering if torchrun can allow me to assign a different number of workers to hosts? Because according to the document, all the launching scripts are required to have the same nproc_per_node.

Thanks for your question. I didn’t find it in the wiki. Maybe @Kiuk_Chung can help answer this question?

1 Like

Should work with torchrun:

# on host0 (2x gpus)
$ torchrun --rdzv_backend c10d --rdzv_endpoint $host0_hostname:$port --nnodes 2 --nproc_per_node 2 

# on host1 (1x gpu)
$ torchrun --rdzv_backend c10d --rdzv_endpoint $host0_hostname:$port --nnodes 2 --nproc_per_node 1 

Just make sure you set the cuda device using the LOCAL_RANK environment variable in your trainer


Also, side note, with heterogeneous nodes, performance could be bottlenecked by the node with fewer resources. I say “could” because it depends on various factors like the hardware topology, network topology, existence of RDMA, collective operation patterns etc. Typically we recommend homogeneous nodes for DDP.