I have two hosts where one has two GPUs and another one has only one GPU. I am wondering if torchrun can allow me to assign a different number of workers to hosts? Because according to the document, all the launching scripts are required to have the same nproc_per_node.
Also, side note, with heterogeneous nodes, performance could be bottlenecked by the node with fewer resources. I say “could” because it depends on various factors like the hardware topology, network topology, existence of RDMA, collective operation patterns etc. Typically we recommend homogeneous nodes for DDP.