Can multiprocessing.Lock / Condition be used with torchrun?

Hi,

I’m trying to understand synchronization options when using torchrun.

multiprocessing.Lock and multiprocessing.Condition rely on process inheritance (fork / mp.spawn), but torchrun launches ranks via spawn + exec, so these primitives don’t seem to work across ranks.

Questions:

  1. Is it fundamentally unsupported to use multiprocessing.Lock / Condition with torchrun?

  2. Would creating a multiprocessing.Manager before torchrun and having all ranks connect to it be considered supported or recommended?

  3. What is the intended torch-native replacement for Condition-like (wait/notify) semantics in torchrun jobs?

Thanks!

You could try to use distributed ops, such as torch.distributed.barrier().