How does one solve the:
[W ProcessGroupGloo.cpp:558] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
error in pytorch?
I read:
- PyTorch distributed example code hang. Deadlock? - #12 by Brando_Miranda
- Distributed communication package - torch.distributed — PyTorch 1.7.1 documentation
- and more
but they weren’t terribly helpful.
My questions are:
- what does this error mean?
- how do I resolve it when using pytorch?
I am trying to run this in my local machine for debugging purposes. It doesn’t work in pycharm, pycharm debug or the terminal locally. The clusters seem to work for some reason.
Why is this and how do I fix it?
cross posted:
- How does one set the PyTorch distributed hostname, port, and GLOO_SOCKET_IFNAME so that DDP works? - Quora
- python - How does one set the pytorch distributed hostname, port and GLOO_SOCKET_IFNAME so that DDP works? - Stack Overflow
- https://www.reddit.com/r/pytorch/comments/lw1tkv/how_does_one_set_the_pytorch_distributed_hostname/