I have a dataset for classification like below
class-1: 2500 images class-2: 2500 images class-3: 2500 images ... ... class-10: 2500 images
Since each class has variations that could be clustered in its own class. Like if I run k-means for each class with 10 clusters
class-1: 2500 images split into 10 clusters class-2: 2500 images split into 10 clusters .. ..
Now in the dataloader for a batch size of 100
- 10 examples from class-1 (from each of the ten clusters in class-1) - 10 examples from class-2 (from each of the ten clusters in class-2) ... - 10 examples from class-2 (from each of the ten clusters in class-10)
In total there will be good 100 samples in the batch.
Is there a way to do it and how do I do it efficiently?