How Do I Obtain GROUP_RANK and LOCAL_WORLD_SIZE in Code?

When I start DDP with torchrun, I can get this information through environment variables.

print("env GROUP_RANK", os.environ["GROUP_RANK"])
print("env LOCAL_WORLD_SIZE", os.environ["LOCAL_WORLD_SIZE"])

But I can’t assume that my code always works in the torchrun scenario, such as the torch.multiprocessing.spawn scenario where I can’t get these two environment variables. So I wonder if there’s an API like this torch.distributed.get_global_rank that can help me get both of these information steadily. If not, should we provide a new one?

I found that my question had been raised before it seems that the problem has not been solved head-on.

One option could be to use:


Would there be something wrong with that solution?