I was able to get past this issue by setting os.environ["NCCL_SOCKET_IFNAME"]="ens5"
. However, it’s still not clear to me why this is needed since this was working on an older version, so I created an issue here: NCCL Network is unreachable / Connection refused when initializing DDP · Issue #68893 · pytorch/pytorch · GitHub.