Hi, say I have a dataset that is passed into data loader and call the model to do inference (iterate through the data loader).
Say you use
SequentialSampler and using a single GPU/TPU with batch size say 32.
You know that 32 predicted labels in each batch will follow the ascending order of the sample in the dataset so you know which predicted label refers to the dataset index. You stack with each iteration until the end and you have your predicted label which follows the order of the original dataset’s index.
Now if you use distributed training to quickly do inference, how do you know which GPU/TPU is taking the sample indices from the dataset and after getting the predicted label I can merge back to the original dataset? This is because some GPU/TPU may do it quicker than others and will mess with the order of the indices? Or even batch 1 will not send to GPU0, batch 2 to GPU1 etc?