Collate + Sampling the batch in particular order


Problem statement: I got variety images to classify from 40x42 to 560x420. I want to train it dynamicaly:

Partial Solution:I got batches close to their image size (collate_fn), but batches are not shuffled. (Close to their image size to dont pad too much)

What i want: How to shuffle the batches like, take low size images batch then higher size image size sequentialy ? low, high , low , medium etc.

I dont want manualy label batches in pandas df :smiley:

One approach would be to create a custom sampler implementing your logic of using the spatial sizes to batch “similar” images together. You could check the implementation of a few built-in samplers, implement your logic in a custom class, and pass it to the DataLoader. I don’t know if you have any information about the image shapes at each index, but it might be worth to grab this information once as loading the entire dataset just to check the image shapes might be too expensive.

@ptrblck Thanks for quick reply. Yes i got shape information and its no problem. I will check sampler implementations however during the night i found idea outside pytorch to sort df before putting into Dataset. Then divide //BATCH_SIZE to give batch_id. Then shuffle batch_ids. But i want to do it in pytorch mechanics so i am starting to read.


If i understand correctly, this logic i wanted to make in df i can make in Sampler class

Best regards