Creating a Dataloader with atleast 2 instances from each class

I am performing a multiclass classification problem which requires me to find the intra class covariance in each epoch. For this reason, I need atleast 2 instancess from each class as the covariance formula has 1/(n-1) where n is the number of instance per class and if ther is only a single instance, then it gives Zero Divison Error. So, my question is how do I make a dataloader which has atleast 2 instanceper data class ?

By default all samples will be used in each epoch, so assuming your dataset contains at least two samples per class you wouldn’t have to change anything.
If that’s not the case you could try to oversample the classes with a single sample e.g. with a WeightedRandomSampler or you could create a custom sampler guaranteeing to sample specific classes.

Well, I need a dataloader where each batch contains atleast 2 samples from each class. Sorry for the confusion !!

In this case this topic might be interesting.

Hi, actually I haven’t been able to convey my doubt clearly. My doubt is that I want to create a dataloader where no batch has any class with a single instance.
Let’s say, I have 100 classes → {1,2,3,4,5,…,100}
My dataloader has a batch size of 10.
The kind of batches that I want are → {1,1,2,2,3,3,9,9,9,9} , {79,79,79,32,32,64,64,64,64,64}
The kind of batches that I don’t want are ->{1,2,4,8,81,81,81,4,4,4} because there is only a single instance form class 1 and class 2. This is raising a error while calculating intra-class covariance during training.

I still think a custom sampler could still work as it’s responsible to create the sample indices and pass it to the Dataset.__getitem__. Creating a custom BatchSampler will also allow you to pass all batch indices to the __getitem__ method and thus load the entire batch.