How to get the sample indices in inference?

gabrielwong1991 · August 18, 2021, 1:11am

Hi, say I have a dataset that is passed into data loader and call the model to do inference (iterate through the data loader).

Say you use SequentialSampler and using a single GPU/TPU with batch size say 32.

You know that 32 predicted labels in each batch will follow the ascending order of the sample in the dataset so you know which predicted label refers to the dataset index. You stack with each iteration until the end and you have your predicted label which follows the order of the original dataset’s index.

Now if you use distributed training to quickly do inference, how do you know which GPU/TPU is taking the sample indices from the dataset and after getting the predicted label I can merge back to the original dataset? This is because some GPU/TPU may do it quicker than others and will mess with the order of the indices? Or even batch 1 will not send to GPU0, batch 2 to GPU1 etc?

mrshenli · August 24, 2021, 3:01am

cc @VitalyFedyunin for data loader questions

VitalyFedyunin · September 1, 2021, 8:18pm

As an option, modify your dataset to add an index as part of the return.

gabrielwong1991 · September 1, 2021, 8:32pm

Exactly what I did! Thanks all

Strawn985 · September 3, 2021, 6:26am

population typically select a sample of individuals or families and compute an estimate of the population inequality index from this sample.