TCP initialization for AWS different Regions

Hi~ I want to use TCP initialization in torch.distributed.init_process_group for different AWS regions. I am sure I open all traffic in AWS EC2 instances and they could be communicated with socket using public IP. However, when I use TCP initialization to initial the process group, the master node will only store the private IP in the TCP_Store from other nodes, thus I guess why they cannot communicate.

Could I force the initial process to store public IP from other nodes? Or is there any other methods to let nodes in different area (but can communicate with public IP) to use torch.distributed? Thanks for any help!

cc @cbalioglu , do you have any insight about it?

@HuYang719 How do you initiate your jobs right now? Using python -m torch.distributed.run?