Hi~ I want to use TCP initialization in torch.distributed.init_process_group
for different AWS regions. I am sure I open all traffic in AWS EC2 instances and they could be communicated with socket using public IP. However, when I use TCP initialization to initial the process group, the master node will only store the private IP in the TCP_Store from other nodes, thus I guess why they cannot communicate.
Could I force the initial process to store public IP from other nodes? Or is there any other methods to let nodes in different area (but can communicate with public IP) to use torch.distributed? Thanks for any help!