Init_process_group freezing

I use the following repo. to train a TTS on my own dataset : Tacotron 2

All stuffs are going well, but when I put distributed_run = True in the process has been stucked at init_process_group function in at line 35 (I put my world_size = 3). My GPU driver and cuda are okey! Is this a bug? If yes how can I fix it?


Hi, can you confirm that all ranks reach line 35.

No. Thanks, good point! The bug is in Nvidia code.