I have a dataset for classification like below
class-1: 2500 images
class-2: 2500 images
class-3: 2500 images
...
...
class-10: 2500 images
Since each class has variations that could be clustered in its own class. Like if I run k-means for each class with 10 clusters
class-1: 2500 images split into 10 clusters
class-2: 2500 images split into 10 clusters
..
..
Now in the dataloader for a batch size of 100
- 10 examples from class-1 (from each of the ten clusters in class-1)
- 10 examples from class-2 (from each of the ten clusters in class-2)
...
- 10 examples from class-2 (from each of the ten clusters in class-10)
In total there will be good 100 samples in the batch.
Is there a way to do it and how do I do it efficiently?