How to get Sample from dataset with a certain percentage of each category

I’m quite new to PyTorch. I have a dataset which is an ImageFolder. my dataset contains some folders which are the classes and each folder has some images.
I want to split the data into the train_set and test set. but I want to pick 20 percent of each class randomly and put them into test_set. I have the flowing code.

dataset = ImageFolder( './data'
                      , transform=transform)

validation_split = 0.2

indices = list(range(len(dataset)))

#split the dataset into train and test sets randomly

train_sampler = SubsetRandomSampler(train_indices)
test_sampler = SubsetRandomSampler(test_indices)

train_loader = torch.utils.data.DataLoader(dataset, batch_size=64, sampler=train_sampler, num_workers=16)

test_loader = torch.utils.data.DataLoader(dataset, batch_size=64, sampler=test_sampler, num_workers=16)

How can I do that?

You could use sklearn.model_selection.train_test_split with the stratify argument to create the training and testing indices.

1 Like