Hi,
I have a question about the p2p communication in torch.distributed. Suppose we set up a group with 3 processes using command init_process_group(backend=‘gloo’, init_method=“tcp://10.0.0.1:8888”, rank=args.rank, world_size=3) on three different nodes with IP 10.0.0.1 to 10.0.0.3. When we are sending tensors from 10.0.0.2 to 10.0.0.3, how is the underlying network traffic routed? Is it directly from 10.0.0.2 to 10.0.0.3 or from 10.0.0.2 to 10.0.0.1 and then to 10.0.0.3? Probably the answer is obvious but I couldn’t find it based on the doc’s description. Thanks in advance!
Yijing