Can multiprocessing.Lock / Condition be used with torchrun?

dem123456789 · January 10, 2026, 9:28am

Hi,

I’m trying to understand synchronization options when using torchrun.

multiprocessing.Lock and multiprocessing.Condition rely on process inheritance (fork / mp.spawn), but torchrun launches ranks via spawn + exec, so these primitives don’t seem to work across ranks.

Questions:

Is it fundamentally unsupported to use multiprocessing.Lock / Condition with torchrun?
Would creating a multiprocessing.Manager before torchrun and having all ranks connect to it be considered supported or recommended?
What is the intended torch-native replacement for Condition-like (wait/notify) semantics in torchrun jobs?

Thanks!

ptrblck · January 11, 2026, 6:17pm

You could try to use distributed ops, such as torch.distributed.barrier().