This issue may be that host 1 is not in the same subnet as the IP address specified in NCCL_COMM_ID
. It is not about the port specified in NCCL_COMM_ID
.
For example, if host 0 is in subnet 192.168.1.xx, and host 1 is in subnet 10.0.1.xx, this won’t work.