How can I get returns from a function in distributed data parallel?

seungjun · May 3, 2021, 2:29am

Hi, you can use

torch.multiprocessing.SimpleQueue to let the child processes to put the results in the queue.
point-to-point communication functions to send tensors between different distributed processes.

You may want to refer to this thread for more explanation.