How to get a batch with equal number of images of each class?

Zhang_Chi · March 6, 2018, 12:55pm

i have a dataset that there is a huge difference in the number of images in different classes. So i want to get a batch with equal number of images of each class. anyone knows how to do that?

ptrblck · March 6, 2018, 12:59pm

You could use the WeightedRandomSampler.

The issue was also discussed in this thread.

aliutkus · January 2, 2019, 2:12pm

hi,

thanks for that answer, I have the same question, but the thing is: can you guarantee that you’ll get exactly the same number of samples from each class within a batch this way ?

aliutkus · January 2, 2019, 3:46pm

answering to myself, here’s the solution I came up with. It amounts to creating a loader for each class, and using all those loaders for each batch:

    dataset = datasets.ImageFolder(
        my_rootfolder,
        transform=my_transforms)
    # now create the loaders for the different classes
    loaders = []
    for class_name in dataset.classes:
        # get the indices in the dataset that are relative to that class
        idx = [
            pos for pos, item in enumerate(dataset.samples)
            if item[1] == dataset.class_to_idx[class_name]]
        # construct the corresponding dataloader thanks to a SubsetRandomSampler
        loaders += [torchdata.DataLoader(
            dataset, batch_size=args.batch_size,
            sampler = torchdata.sampler.SubsetRandomSampler(idx),
            **kwargs)]


    # now using the thing:
    for items in zip(*loaders):
        # items[idx] is the batch for class idx,
        # ...

pheonix511 · September 6, 2019, 6:46pm

I have a dataset already splited manually into train, validation and test with two classes each 0 and 1
i want my network to fetch equal number of images in every batch
constrain i am not having equal number of images in each class
Train —> class 0 —> 1000, class 1 —>100
say if my batch size is 30 , i need to have 15 from class 0 and 15 from class 1 for every iteration.
how to achieve this? also i am using Vgg16 architecture in keras