Create Dataloader that samples even number of data from each class

I need to create a dataloader that samples random datapoints from each class or even maybe given a probability distribution it samples that proportion from each class. Is this possible?

Hello,

Yes you can implement a Sampler (See here for details)

may I have an example of creating your own sampler for something like this? I’m assuming I have to got through the dataset sequentially and separate it into different classes as well?

Have a look at Detectron2’s RepeatFactorTrainingSampler, for an example of one approach. Even if you don’t use a Detectron2 model, the sampler is just standard Pytorch code and should work fine with other models. I found that it didn’t deal properly with empty images, however, and I modified it a bit for my own use.

1 Like