I’m writing a data loader that is very IO and memory intensive, and as such, I want only one worker per node handling the data loading and scattering the results to the other local workers.
To to this, I need to know the RANKs of all other local workers in my local worker group. So my questions are:
Given the RANK of a worker whose LOCAL_RANK is 0, can I assume that the RANK’s of the other local workers will always be RANK+1, RANK+2, …, RANK+(LOCAL_WORLD_SIZE-1)? Where is this documented?
If not, then how can I map the LOCAL_RANK’s 1, 2, …, (LOCAL_WORLD_SIZE-1) to RANK values?