Dataloader with clustered batches

I need to write a dataloader that is customized and I want it to batch some examples that I clustered before using kmeans or any clustering method, together. I am just looking for ideas on how to approach this. Not sure if collate_fn helps here. I appreciate any help. Thanks.

Hey @gebrahimi

I am thinking of storing the examples in a pandas dataframe with a specific column cluster_number to designate the cluster number that the example belongs to.

We can write a CustomDataset for this so that the __getitem__ gives us the example as a torch tensor and the cluster it belongs to.

Then comes in the CustomDataLoader that can be used to wrap the CustomDataset.

The examples would be hard to store in a pandas dataframe if they are anything other than 1D arrays. I would personally save 2D or more in seperate folder and use the ImageFolder dataset.