How does one set the pytorch distributed hostname, port and GLOO_SOCKET_IFNAME so that DDP works?

How does one solve the:

[W ProcessGroupGloo.cpp:558] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())

error in pytorch?

I read:

My questions are:

  1. what does this error mean?
  2. how do I resolve it when using pytorch?

I am trying to run this in my local machine for debugging purposes. It doesn’t work in pycharm, pycharm debug or the terminal locally. The clusters seem to work for some reason.

Why is this and how do I fix it?


cross posted:

Hi, I also get same error. Have you solved it?
I can initialize process group cross nodes with nccl backend, but failed in gloo.

Just resolved it, by adding an environment configuration: GLOO_SOCKET_IFNAME=eth0