Server socket cannot connect!

Why would I get error message like this? I am seeting master add to local host and letting master port to be default? I am using one-node 4 gpu system. Any help will be appreciated.

python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --node_rank=0 --master_addr="localhost" train.py configs/check.py

@just_started_coding so you are launching the train program within a single node setting, I guess you don’t need to specify the master_addr in this case, according to the doc Distributed communication package - torch.distributed — PyTorch 1.12 documentation