Select a sequential subset of Dataset

verance · May 21, 2020, 1:38pm

Hello, I would like to implement a loop where iteration by iteration I select part of the dataset and train on it.
What is the best way to select the first x samples of my dataset? Something like this: train_dataset[0:300], train_dataset[300:600]

Another idea is to make that choice random but without repeating numbers at each iteration. So the idea is to select x random numbers from 0, len(train_dataset) and remove them after each iteration. But for that, I would need to slice the train_dataset based on one array, train_dataset[[1, 2, 3]], for example.

Any idea on how to do something like this?

Regards

Kushaj · May 21, 2020, 6:24pm

You can do this by using dataloader. Pass bs=300 and shuffle=False for your first option. For your second option you can set shuffle=True.

Also, if you do not want to use dataloader. Then you use this.

If you create a pytorch dataset. Then you can access the underlying data using dataset.data[0:300]. In this way you can select your x samples.
To get random samples.
- First use torch.randperm(len(dataset)). This will give you a random permutation of integers which you will use as indexes.
- Now you can take the samples based on this list.