Why do we need to set local_rank = int(os.environ["LOCAL_RANK"]) in torchrun?

Brando_Miranda · February 3, 2022, 8:41pm

I saw that we need to do:

local_rank = int(os.environ["LOCAL_RANK"])

but we never set that env variable ourselfs which seems odd. Does torchrun or torch.distributed.launch set it by itself?

xksteven · February 17, 2023, 2:56am

Pytorch will create a new process and pass in the local_rank as an argument or set the LOCAL_RANK environment variable before it creates the process then set it again before creating the next process for the next GPU and so on.

You can see the error and pytorch passing it in if you omit the --local_rank as parse_args argument.