the default of init_method is init_method='env://'
(using nccl
backend),if I want to run another code, what kind of url can I use? thanks.
What do you mean with “run another code”? Do you want to use another distributed backend or different initialization method?
For example: (ignore the export NGPUS)
python -m torch.distributed.launch --nproc_per_node=$NGPUS run1.py
python -m torch.distributed.launch --nproc_per_node=$NGPUS run2.py
After try. just using the form like “env://tmp” can work. thank you.
I see. The env://
initialization method pulls all information it needs from the environment, so it will be isolated to a single run. If you use the file://
initialization method, and any of the processes crashes, it may leave a stale file that prevents you from running something else until you delete it.
1 Like