Sharing Information between DDP Processes

Is there a good way to share information between DDP processes? Even if it’s just something from process-0 to the other processes.

My use-case is that I need to coordinate some data-loading/sampling across my different processes, and it would be good if I could, for example, determine what to sample in process-0 and distribute that information to the other processes.

You can use point-to-point communication or collective operations.

For example, a tensor could be sent by torch.distributed.send and received by torch.distributed.recv functions by specifying the target ranks.

You can choose to broadcast or reduce if you wish. I usually use torch.disbtibuted.all_reduce function to collect loss information between processes.
Example here.

If you use nccl backend, you can only use CudaTensor for communication.

In addition to the above response, standard multiprocessing communication methods such as Queue or mp.Manager should work as well: https://docs.python.org/3/library/multiprocessing.html, assuming your processes are on the same machine.

If your particular use case is around data sampling, you could also look into DistributedSampler, which will automatically partition data out to DDP ranks for you: https://github.com/pytorch/pytorch/blob/c1e6592964261d2856c84e166a0989684f946697/torch/utils/data/distributed.py#L12

Finally, we have also have APIs such as all_gather_object and broadcast_object_list which can be used to communicate general picklable python objects across ranks: https://github.com/pytorch/pytorch/blob/master/torch/distributed/distributed_c10d.py#L1279. These APIs are quite new and subject to significant changes, but may be useful for your use case.