I typically see a ddp script being launched by submitting multiple commands (one per process), e.g.:
python -m torch.distributed.launch --nproc_per_node=1 --nnodes=3 --node_rank=0 --master_addr=127.0.0.1 --master_port=12345
python -m torch.distributed.launch --nproc_per_node=1 --nnodes=3 --node_rank=1 --master_addr=10.47.164.34 --master_port=12345
python -m torch.distributed.launch --nproc_per_node=1 --nnodes=3 --node_rank=2 --master_addr=10.47.164.34 --master_port=12345
torch.distributed.init_process_group(backend="nccl", init_method="env://")
However, I am using a cluster-management system and the admin would prefer I submit only command and hence it would have to be the same command.
Are there any examples of maybe using mpiexec (just to submit the command) or anything else - so that master, slaves, etc are created automatically?