I need to write a dataloader that is customized and I want it to batch some examples that I clustered before using kmeans or any clustering method, together. I am just looking for ideas on how to approach this. Not sure if collate_fn helps here. I appreciate any help. Thanks.
I am thinking of storing the examples in a pandas dataframe with a specific column
cluster_number to designate the cluster number that the example belongs to.
We can write a CustomDataset for this so that the
__getitem__ gives us the example as a torch tensor and the cluster it belongs to.
Then comes in the CustomDataLoader that can be used to wrap the CustomDataset.
The examples would be hard to store in a pandas dataframe if they are anything other than 1D arrays. I would personally save 2D or more in seperate folder and use the ImageFolder dataset.