Hi I’m using DistributedDataParallel to run my model across multi-GPU with sync BN. However, my script uses relative imports and is supposed to be run with -m option. How can I do this when launching it via torch.distributed.launch?
Example (does not work, but I’d like to do this): python -m torch.distributed.launch --nproc_per_node 2 -m detector.train --arg1 --arg2
Thanks for your reply but I think you misunderstood. The issue is not running torch.distributed.launch with -m option. The problem is that my script uses relative imports and it is supposed to be run with -m option. I reckon that when torch.distributed.launch spawns the script it uses the more natural approach python detector/script.py, whereas I’d like it to call like python -m detector.script
You can create a copy of this file and customize it the way you want. Here this module will spawn parallel processes according to their rank. You could arrange your script so that the cmd could look like cmd = python -m detector.script --local_rank --arg1 --arg2 ....