Is there a good way to share information between DDP processes? Even if it’s just something from process-0 to the other processes.
My use-case is that I need to coordinate some data-loading/sampling across my different processes, and it would be good if I could, for example, determine what to sample in process-0 and distribute that information to the other processes.
You can use point-to-point communication or collective operations.
For example, a tensor could be sent by
torch.distributed.send and received by
torch.distributed.recv functions by specifying the target ranks.
You can choose to
reduce if you wish. I usually use
torch.disbtibuted.all_reduce function to collect loss information between processes.
If you use nccl backend, you can only use
CudaTensor for communication.
In addition to the above response, standard
multiprocessing communication methods such as
mp.Manager should work as well: https://docs.python.org/3/library/multiprocessing.html, assuming your processes are on the same machine.
If your particular use case is around data sampling, you could also look into
DistributedSampler, which will automatically partition data out to DDP ranks for you: https://github.com/pytorch/pytorch/blob/c1e6592964261d2856c84e166a0989684f946697/torch/utils/data/distributed.py#L12
Finally, we have also have APIs such as
broadcast_object_list which can be used to communicate general picklable python objects across ranks: https://github.com/pytorch/pytorch/blob/master/torch/distributed/distributed_c10d.py#L1279. These APIs are quite new and subject to significant changes, but may be useful for your use case.