Confusion about 'master_port'

Hi, there. Recently I used multiple cpu cores for training. On my own PC, macbook 2017 (1 cpu 4 cores), I just set os.environ[‘MASTER_PORT’] as one single value and multiprocesses could run on the same server. However, when I migrated codes to the cluster in order to use more cores, I need to give a different value to os.environ[‘MASTER_PORT’] for each process. If not, the permission denied as below.

store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Permission denied

I don’t know much about the reason here, could someone explain it?

Hey @Meraki the MASTER_PORT needs to be set to the same value for all processes, otherwise, they cannot conduct rendezvous correctly. This error might be caused by other configurations. It might be helpful to print the value of the following environment variables on all processes right before init_process_group is called: “MASTER_ADDR”, “MASTER_PORT”, “RANK”, “WORLD_SIZE”

Yep, you are right. I just figure out the reason is that on Linux, you need root permissions to open a port below 1024. That’s why the permison denied.

Thanks for your help.@mrshenli

2 Likes