Can I also get data indices using dataloader?

Hello, everyone! In my project, I need to know the indices of the sampled data points in the training set. But I don’t know how to do this.

I tried an alternative below by setting the shuffle to False, in which I don’t need to know the data indices because the sampled data will simply has the same order as in the training set. But it seems the dataloader still reorders the samples.

sampler = torch.utils.data.DataLoader(
    trainset, batch_size = args.train_batch, shuffle = False, num_workers = args.workers
)
sampler = iter(sampler)
for i in range(len(trainloader)):
    inputs, targets =next(sampler)
    print(torch.sum(torch.abs(
        inputs[0]-trainset[i * args.train_batch][0]
    )))

If the dataloader has returned data exactly as the data’s order in the training set, this snippet should print 0s. But it prints non-zero values

Helooo :sunny:

The best way would be to create your own dataset class. In it you could return your images (or whatever data you have) plus your index for that data item. Try this and report back if you have a hard time creating your dataset :slight_smile:

1 Like

Thank you for your help. This solves my problem :smile:

1 Like