Torchrun and python -m

How to launch multi-node training via torchrun if my script contains relative imports?
Earlier, my script spawned workers by itself, so I could bypass torch.distributed/torchrun and launch training like this:

python -m parent.train

But now I want to use torchrun.

are you facing failures of torchrun when using relative imports? could you post a simple repro program and also attach the error paste

Hi @kimihailv,

Have you tried using absolute paths instead of relative paths? That’s general practice when running on HPCs.