Specifying ports to be used in Pytorch multi-node distributed training

sarthak_garg · July 30, 2019, 11:19pm

Hello,

I am facing problems with using torch distributed training in a multi-node setup. Apart from the specified master port, looks like Pytorch tries to open random ports for inter-node communication. In my setup I get a limited number of specified open ports. Is there some way I can force Pytorch to use only the given ports for internode communication?

Thanks!

vadimkantorov · September 11, 2020, 7:53am

We got the same problem…