I have a question about the p2p communication in torch.distributed. Suppose we set up a group with 3 processes using command init_process_group(backend=‘gloo’, init_method=“tcp://”, rank=args.rank, world_size=3) on three different nodes with IP to When we are sending tensors from to, how is the underlying network traffic routed? Is it directly from to or from to and then to Probably the answer is obvious but I couldn’t find it based on the doc’s description. Thanks in advance!


Hey @yijing

The message will directly send from to

In init_process_group, the init_method=“tcp://” is only for rendezvous, i.e., all process will use the same ip:port to find each other. After that communications don’t need to go through master.

BTW, if you are using p2p comm, torchrpc might be useful too. Here is a tutoral.

