Shuffle dataset with larger granularity

imagine we have 10 classes and each class has 100 images and also the batch_size is 10. I want the dataloader, each time select one of the classes randomly and then select a batch from the images inside that particular class. Do I need to write some sampler (or used available ones) or I can handle this some other way ?

Hi Ali!

I don’t have an answer to your question, but I do have a question
for you, below.

I’m curious why you would want to do this. Normally you shuffle
datasets and select random batches when you are training a
model, for example, when you are calculating the gradient
(averaged over the batch) to perform a gradient-descent step.

It would seemed to be counter-productive in such a situation to
have all of the samples in your batch be from the same class.

Not that it affects the answer to your question, but I am wondering
about the motivation and potential use case behind your question.

Thanks for any ideas or new tricks here.

K. Frank

Hi Frank,
Actually this post related to my other post Finetuning intermediate layers of resnet18. take a look at that post and that may make more sense what I’m doing.

It might not be the “correct” way to do it, but you could load #batchsize of images in the dataset and return all of them. This would mean that you tell the dataloader that you have a batch size of 1, but each sample in that batch has #batchsize images that you can reshape the way you want it.

If you use this method you could use the index in the get_item method in the dataset to load the class images of your choice.

Another way to do it is to write your own sampler as you suggested. I did something similar but it turned out to be a bit more code than I thought. If you decide to go this route and need some pointers, hit me up :slight_smile:

Actually what I decided to do is to have 10 different dataloaders, each one correspond to each classes. Then I randomly select between the dataloaders. if you can share the critical part of the sampler code that you have written may be I can try your way too.


Hello Ali!

Thank you. Okay, I think I see where you might be coming from.
Would I be right to think that your intent is to sample a single-class
batch, and then focus that training step (somehow) on the cluster
assigned to that class?


K. Frank

what do you mean where I’m coming from, and yes I have almost same intention you described.