Hi,
I am trying init dist and get stuck.
I have 2 nodes: master and slave, both pytorch 1.3.1 installed by anaconda
It works on both when:
dist.init_process_group(
backend ="NCCL",
world_size = 2,
rank = 0,# 0 for master and 1 for slave
init_method="tcp://192.168.1.102:23458"#master addr and port
)
It is hung up on both when:
store = dist.TCPStore("192.168.1.102", 23458, 2, 0)
Could somebody help?
Thanks in advance and Happy New Year.