Dataloader from Subset: how to directly get the i-th batch values

kuzand · March 3, 2020, 1:47pm

I am creating a dataloader from a Subset as follows:

dataset = torch.utils.data.TensorDataset(torch.rand(100))
train_subset, test_subset = torch.utils.data.random_split(dataset, [80, 20])
train_loader = torch.utils.data.DataLoader(train_subset, batch_size = 10)
test_loader = torch.utils.data.DataLoader(test_subset, batch_size = 10)

How can we get the i-th batch directly, without a for-loop? I am doing the following:

train_loader.dataset.dataset[train_loader.dataset.indices][0][list(train_loader.batch_sampler)[i]]

which looks ugly and I was wondering if there is a better way.

Oli · March 3, 2020, 7:58pm

You can get a sample at index i with train_subset[i]. Since your dataloaders arent shuffled, you can calculate which indices you want and slice the dataset

train_subset[i:i+ batch_size]

kuzand · March 3, 2020, 9:43pm

What if instead of the TensorDataset (which I used to simplify the example) I have ImageDataset and moreover I use a sampler (WeightedRandomSampler) for dataloader?

Oli · March 4, 2020, 4:27pm

I don’t know how to solve that in a few lines of code. If you can get a hold of the indicies for the i-th batch you can easily slice it out of a custom dataset (ImageDataset in your case I guess).

I’ve done a sampler before where I manually shuffled the indicies which allowed me to do similar things do what you describe. Perhaps you could wrap the WeightedRandomSampler class and do something similar for yourself?