Why not use multiprocessing in torchvision/reference

Hi, I’m learning to write distributed program by looking at the source code of torchvision

I was confused when I saw this part about evaluate.

coco_evaluator is a warpper for pycocotools.cocoeval.COCOeval

When all the processes have been evaluated, we need to gather all the results.

The core of synchronize_between_processes is torch.distributed.all_gather

That’s follow those steps:

  1. coco.Eval computed on CPU in single process
  2. synchronize_between_processes transfer coco evaluate result to GPU
  3. doing all_gather over all process on GPU
  4. all_gather result transfer to CPU for logger

I’s so trouble, Why not just use CPU directly, here are some other ways,like, pipe or queue communicate between processes.

Is it because there is some reason here, to use GPU instead of multiprocessing.pipe/multiprocessing.queue?