Question about init_method in distributed.init_process_group

the default of init_method is init_method='env://' (using nccl backend),if I want to run another code, what kind of url can I use? thanks.

What do you mean with “run another code”? Do you want to use another distributed backend or different initialization method?

For example: (ignore the export NGPUS)

python -m torch.distributed.launch --nproc_per_node=$NGPUS run1.py
python -m torch.distributed.launch --nproc_per_node=$NGPUS run2.py

After try. just using the form like “env://tmp” can work. thank you.

I see. The env:// initialization method pulls all information it needs from the environment, so it will be isolated to a single run. If you use the file:// initialization method, and any of the processes crashes, it may leave a stale file that prevents you from running something else until you delete it.

1 Like