Connection refused with GLOO process group initialization

taehyunzzz · March 2, 2022, 5:04pm

As @cbalioglu pointed out for us, the initial master IP and port are used for rendezvous only. Random ports are used for post-rendezvous communications, so you might be having a firewall and port problem.
This could be risky in terms of security, but I solved a similar problem by doing :

node 0 : sudo ufw allow from [node1 IP]
node 1 : sudo ufw allow from [node0 IP]

I hope there will be some updates for not having to do this… Hope this helps tho