Hang up without any output while convert model with DistributedDataParallel

According to the network engineer from Nvidia, they said the problem occured by the hardware… The chips should obey the RoCE, such as MCX4 or MCX5…

But still thanks for your attention ~