I want to try the distribution of pytorch, I used two nodes, but reported an error during the startup of the second node:
Traceback (most recent call last):
File “test.py”, line 205, in
main()
File “test.py”, line 198, in main
rank=args.rank,
File “/home/loring/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py”, line 410, in init_process_group
timeout=timeout)
File “/home/loring/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py”, line 478, in _new_process_group_helper
timeout=timeout)
RuntimeError: [/opt/conda/conda-bld/pytorch_1570710822989/work/third_party/gloo/gloo/transport/tcp/pair.cc:761] connect [::1]:14728: Connection refused
who can help me?thanks