Is it possible to use the multiprocessing Queue to communicate between processes launched with torchrun?
I set up ‘a’ Queue and launch the processes like so:
test.py
import torch.distributed as dist
from torch.multiprocessing import Queue
if __name__ == "__main__":
local_rank = int(os.getenv("LOCAL_RANK", 0))
world_size = int(os.getenv("WORLD_SIZE", 1))
rank = int(os.getenv("RANK", 0))
is_distributed = world_size > 1
queue = Queue()
if is_distributed:
dist.init_process_group(backend="gloo", rank=rank, world_size=world_size)
.....
launch.sh
#!/bin/bash
#
export OMP_NUM_THREADS=1
torchrun \
--standalone \
--nnodes=1 \
--nproc_per_node=4 \
--master_addr=$(hostname)\
--master_port=34567 \
test.py \
and find that each rank has an independant Queue… Is it possible to share the queue between processes launched using torchrun?