Return from mp.spawn()

arinaruck · August 27, 2020, 12:24pm

Hi!
I am using a nn.parallel.DistributedDataParallel model for both training and inference on multiple gpu.
To achieve that I use
mp.spawn(evaluate, nprocs=n_gpu, args=(args, eval_dataset))
To evaluate I actually need to first run the dev dataset examples through a model and then to aggregate the results. Therefore I need to be able to return my predictions to the main process (possibly in a dict, but some other data structure should work as well). I’ve tried providing an extra dict argument mp.spawn(evaluate, nprocs=n_gpu, args=(args, eval_dataset, out_dict)) and modifying it in the function but apparently spawn copies it, so the dict in the main process is not modified.
I guess, I could write the results to the file and then read in the main process but it doesn’t seem like the most elegant solution. Is there a better way to return values from spawned functions?
Thanks!

mrshenli · August 27, 2020, 2:28pm

Is there a better way to return values from spawned functions?

If you want to pass the result from spawned processes back to the parent process, you can let the parent process create multiprocessing queues, pass it to children processes, and let children processes send result back through the queue. See the following code:

github.com

pytorch/pytorch/blob/cb26661fe4faf26386703180a9045e6ac6d157df/test/test_multiprocessing.py#L577-L600


      
          @unittest.skipIf(NO_MULTIPROCESSING_SPAWN, "Disabled for environments that \
                           don't support multiprocessing with spawn start method")
          @unittest.skipIf(not TEST_CUDA_IPC, 'CUDA IPC not available')
          def test_event_multiprocess(self):
              event = torch.cuda.Event(enable_timing=False, interprocess=True)
              self.assertTrue(event.query())
          
              ctx = mp.get_context('spawn')
              p2c = ctx.SimpleQueue()
              c2p = ctx.SimpleQueue()
              p = ctx.Process(
                  target=TestMultiprocessing._test_event_multiprocess_child,
                  args=(event, p2c, c2p))
              p.start()
          
              c2p.get()  # wait for until child process is ready
              torch.cuda._sleep(50000000)  # spin for about 50 ms
              event.record()
              p2c.put(0)  # notify child event is recorded

This file has been truncated. show original

If the result does not have to go back to the parent process, you can use gather or allgather to communicate the result across children processes.

xannex · November 17, 2020, 4:04pm

Hi Arina. What was your solution at the end?

arinaruck · November 17, 2020, 4:22pm

Hi, for me writing to files actually worked out quite alright.
I’m just using pickle.dump in the spawned processes and pickle.load in the main one

arinaruck · February 9, 2021, 10:29am

Actually met this problem once again recently and decided to make use of the mp.Queue as @mrshenli suggested, so I used

result_queue = mp.Queue()
for rank in range(args.n_gpu):
        mp.Process(target=get_features, args=(rank, args, eval_dataset, result_queue)).start()

to start the processes instead of mp.spawn and just added the results in my queue with

result_queue.put((preds, labels, ranks))

in those processes.
To later collect the results I did the following:

        for _ in range(args.n_gpu):
            temp_result = result_queue.get()
            preds.append(temp_result[0])
            labels.append(temp_result[1])
            ranks.append(temp_result[2])
            del temp_result