Class imbalance train test split

I’m interested in doing an image dataset manual split of 2 classes that are 94% and 6% of my dataset. I’m at a loss for how to do this as I’ve just been doing a subset random sampler, but I’d like to have equal minority presence in train/valid/test splits.

How can I make a list of majority+minority images and then pass into ImageFolder?

If you have stored the targets in your Dataset or can somehow precompute them, you could use scikit's train_test_split to get the training and test indices. Using these indices you can create a training and test Dataset using Here is a small dummy example:

import numpy as np
from sklearn.model_selection import train_test_split

class MyDataset(Dataset):
    def __init__(self): = torch.randn(1000, 3, 24, 24) =
            torch.zeros(940, dtype=torch.long),
            torch.ones(60, dtype=torch.long)
    def __getitem__(self, index):
        x =[index]
        y =[index]
        return x, y
    def __len__(self):
        return len(

dataset = MyDataset()
targets =
train_indices, test_indices = train_test_split(np.arange(targets.shape[0]), stratify=targets)

# Check class balance
_, train_counts = np.unique(targets[train_indices], return_counts=True)
_, test_counts = np.unique(targets[test_indices], return_counts=True)
print('Train balance {}\nTest balance {}'.format(
    train_counts[1]/train_counts[0], test_counts[1]/test_counts[0]))
> Train balance 0.06382978723404255
> Test balance 0.06382978723404255

train_dataset = Subset(dataset, indices=train_indices)
test_dataset = Subset(dataset, indices=test_indices)