Dist.init_process_group hangs with more GPUs

I am trying to initialize the Data Parallelism communications on four GPUs as follows:

rank = 0
os.environ['MASTER_ADDR'] = '127.0.0.1'
os.environ['MASTER_PORT'] = str(34567 + local_rank)
dist.init_process_group("gloo", rank=0, world_size=4)

Using world_size = 4 hangs forever but changing it to 1 works. I tried to use NCCL instead of gloo but didn’t make a difference.

Hi, you need to spawn 4 processes, and call this on each process