Dist.init_process_group hangs silently

Hello,

I am trying to get a multi gpu training sess on, but the process keeps hanging on dist.init_process_group( no error or any kind of INFO message, even with NCCL_DEBUG=INFO). Any insights on how to fix this?

Fri Sep  6 15:27:48 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN V             Off  | 00000000:18:00.0 Off |                  N/A |
| 38%   54C    P8    30W / 250W |      0MiB / 12066MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN V             Off  | 00000000:3B:00.0 Off |                  N/A |
| 44%   61C    P8    40W / 250W |      0MiB / 12066MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 00000000:86:00.0 Off |                  N/A |
| 25%   44C    P8    10W / 250W |      2MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 00000000:AF:00.0  On |                  N/A |
| 25%   45C    P5    20W / 250W |   1242MiB / 12193MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
torch==0.4.1   (ALSO tried with 1.1)
torchaudio==0.2
torchsummary==1.5.1
torchtext==0.4.0
torchvision==0.4.0
1 Like

where you able to solve this? I also have mine hanging…