Replacing mp.spawn
with the start
and join
methods can solve this problem. The issue is likely caused by a faulty implementation of spawn
in PyTorch, which leads to incorrect mapping of shared memory between processes. Using start
and join
avoids this problem and prevents segmentation faults.
# mp.spawn(run, args=(world_size, q), nprocs=world_size, join=True)
children = []
for i in range(world_size):
subproc = mp.Process(target=run, args=(i, world_size, q))
children.append(subproc)
subproc.start()
for i in range(world_size):
children[i].join()
With modifications made like this, the code should work properly.