When using torchrun with elasticity, nodes can join or leave the group.
I want to current state of environments and I found torch.distributed.get_world_size()
, torch.distributed.get_rank()
.
I am not sure, but these two functions seems to return values of current state.
However I can’t find the way to get the number of current nodes.
My questions are
- Do
get_world_size()
,get_rank()
return value of current state? - Is there a function that return the number of nodes?
Thanks!