By default, both NCCL and Gloo backends will try to find the network interface to use for communication. However, this is not always guaranteed to be successful from our experiences. Therefore, if you encounter any problem on either backend not being able to find the correct network interface. You can try to set the following environment variables (each one applicable to its respective backend):
-
NCCL_SOCKET_IFNAME , for example
export NCCL_SOCKET_IFNAME=eth0
-
GLOO_SOCKET_IFNAME , for example
export GLOO_SOCKET_IFNAME=eth0
https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization
BTW, use ifconfig to find your first Ethernet interface.