Select a sequential subset of Dataset

Hello, I would like to implement a loop where iteration by iteration I select part of the dataset and train on it.
What is the best way to select the first x samples of my dataset? Something like this: train_dataset[0:300], train_dataset[300:600]

Another idea is to make that choice random but without repeating numbers at each iteration. So the idea is to select x random numbers from 0, len(train_dataset) and remove them after each iteration. But for that, I would need to slice the train_dataset based on one array, train_dataset[[1, 2, 3]], for example.

Any idea on how to do something like this?


You can do this by using dataloader. Pass bs=300 and shuffle=False for your first option. For your second option you can set shuffle=True.

Also, if you do not want to use dataloader. Then you use this.

  1. If you create a pytorch dataset. Then you can access the underlying data using[0:300]. In this way you can select your x samples.
  2. To get random samples.
    • First use torch.randperm(len(dataset)). This will give you a random permutation of integers which you will use as indexes.
    • Now you can take the samples based on this list.