Deadlock using DistributedDataParallel

Hi, I’m facing a problem. When using DistributedDataParallel with NCCL, my training will meet a deadlock. According to the pytorch doc, I try to set the set_start_method to spawn and forkserver, but an error that address already in use occurs.

Faced a similar error - solved it by initializing the process group first, and then setting the model cuda device (as opposed to the other way around, which led to the same kind of deadlock you describe)