Run `torch.distributed.launch` with `-m` option for the script

arc144 · May 30, 2019, 8:15pm

Hi I’m using DistributedDataParallel to run my model across multi-GPU with sync BN. However, my script uses relative imports and is supposed to be run with -m option. How can I do this when launching it via torch.distributed.launch?

Example (does not work, but I’d like to do this):
python -m torch.distributed.launch --nproc_per_node 2 -m detector.train --arg1 --arg2

Thanks

LeviViana · May 31, 2019, 12:05pm

Take a look at this snippet, it could help.

arc144 · May 31, 2019, 1:20pm

Thanks for your reply but I think you misunderstood. The issue is not running torch.distributed.launch with -m option. The problem is that my script uses relative imports and it is supposed to be run with -m option. I reckon that when torch.distributed.launch spawns the script it uses the more natural approach python detector/script.py, whereas I’d like it to call like python -m detector.script

LeviViana · May 31, 2019, 3:39pm

You can create a copy of this file and customize it the way you want. Here this module will spawn parallel processes according to their rank. You could arrange your script so that the cmd could look like cmd = python -m detector.script --local_rank --arg1 --arg2 ....

arc144 · May 31, 2019, 3:47pm

It is unfortunate that I have to make a copy and alter it but I guess it works! Thanks a lot =]