Random batch sampling from DataLoader

Hello, I’m interesting if it’s possible to randomly sample X batches of data from a DataLoader object for each epoch. Is there any way of accessing the batches by indexes? Or something similar to achieve such behavior? Thank you for the help.

I guess what you’re asking the following:

  • If you set the dataloader class shuffle=True you can get random samples into your batches. I mean the batches will be created with random samples of the dataset insead of sequential ones.
  • For accesing the batches with indexes try with for batch, (data, target) in enumerate(loader) so the batch variable will store the value of the batch (I think it works that way)

Thank you for your replay. I already tried this solution, the problem is that for each epoch (training loop) I want to select a small subset of batches (lets say 5 batches). If i simply enumerate though the loader (enumerate(loader)) I will get the same order (the shuffled version from the data loader) since.

I am interested if something like this is possible (or some other way to achieve the same behaviour):

data_loader = torch.utils.data.DataLoader(data, shuffle=True)
for epoch in range(100):
   random_ids = np.random.randint(len(data_loader), size=5)
   batches = data_loader[random_ids]

Sorry then! I dont know how to help you with that :grimacing:

No worries, I’ve been struggling with this for a while.

If you want to sample only a specific subset using predefined indices, you could create a new DataLoader with a SubsetRandomSampler or wrap the Dataset into a Subset.