How to manually define batches for each epoch?

Hi, I am trying to manually define training batches for each epoch such that each batch contains only samples that satisfying specific condition (e.g. first batch are all from class 1 with image size smaller than 128*128, second batch are all from class 2 with image size greater than 256*256 and so on, and batches may have different number of samples). That said, using for batch in dataloader would probably not give me correct grouping of batches. I am wondering if anyone know any good way to manually “assign batches” before each training epoch starts?

Thank you!

What you could do is to create the batches manually. Calling your dataset and then evaluating a conditional to see if the input should be added to the batch.

Thanks for your advice! I implemented it with a batch sampler which defines the batches based on some conditionals (ref: But what are PyTorch DataLoaders really? | Scott Condron’s Blog) which works well.

1 Like