Specifying ports to be used in Pytorch multi-node distributed training


I am facing problems with using torch distributed training in a multi-node setup. Apart from the specified master port, looks like Pytorch tries to open random ports for inter-node communication. In my setup I get a limited number of specified open ports. Is there some way I can force Pytorch to use only the given ports for internode communication?


We got the same problem…