Trouble understanding multiprocessing in python and PyTorch (Connection Refused)

Caden_Miller · February 3, 2022, 6:33pm

I new to both python and pyTorch and am trying to understand how multiprocessing works. The following code:

import torch
import torch.multiprocessing as mp

def test(q):
  t = torch.normal(mean=0.0, std=1.0, size=(2, 3))
  q.put(t)

if __name__ == "__main__":
  mp.set_start_method("spawn", force=True)

  q = mp.SimpleQueue()
  processes = []
  for _ in range(4):
    p = mp.Process(target=test, args=(q,))
    p.start()
    processes.append(p)
  for p in processes:
    p.join()
  while q.empty() == False:
    print(q.get())

produces the following error:
Traceback (most recent call last):
File “/home/cadenmiller/Documents/coding/pyTorch/mp.py”, line 20, in
print(q.get())
File “/usr/lib/python3.9/multiprocessing/queues.py”, line 368, in get
return _ForkingPickler.loads(res)
File “/home/cadenmiller/.local/lib/python3.9/site-packages/torch/multiprocessing/reductions.py”, line 289, in rebuild_storage_fd
fd = df.detach()
File “/usr/lib/python3.9/multiprocessing/resource_sharer.py”, line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File “/usr/lib/python3.9/multiprocessing/resource_sharer.py”, line 86, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File “/usr/lib/python3.9/multiprocessing/connection.py”, line 507, in Client
c = SocketClient(address)
File “/usr/lib/python3.9/multiprocessing/connection.py”, line 635, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused

I saw somewhere online that this is because the subprocess must be running until a get the tensor from the queue, but I have no idea how to do that. Any advice or suggestions would be great.