How are the images shuffled within the batches?


(Pablo Rr100) #1

Hi all,

I was wondering how the DataLoader handle the shuffling of images.

Scenario
Let’s say my batch size is 64. I have a dataset of 64.000 training images, therefore, my training_dataloader will create 1.000 batches, each of which will contain 64 images.

I understand that, the moment of selecting which batch comes next in the training loop picks a random index among all the batches. Is this right?

Question
Does each of those 1.000 batches always contains the same 64 images or every time each single batch samples 64 images randomly?

Thank you very much in advance


(Sebastian Raschka) #2

I think the DataLoader just shuffles the complete dataset (i.e., creates a index array of range [0, num_examples] and shuffles this) at each epoch. Then it sweeps over these indices in sequential order. E.g., if you have a dataset like [0, 1, 2, 3, 4, 5, 6] with batch size 2, a random order could be [4, 2, 1, 6, 0, 3, 5], and the minibatches would then be
[4, 2], [1, 6], [0, 3], [5].

This way, you would have random sampling WITHOUT replacement.

I understand that, the moment of selecting which batch comes next in the training loop picks a random index among all the batches. Is this right?

This would be random sampling WITH replacement. I am 99.9% sure that this is not what the DataLoader does (although it is also a valid approach; actually, it is even more “correct” if you think of stochastic gradient descent; as far as I know, it is less common though and maybe not working so well empirically; with large datasets sizes, e.g., > 500k, i doubt you will notice any difference in the resulting model)


#3

You are absolutely correct. If no sampler was passed to the DataLoader and shuffle=True was set, a RandomSampler will be used.
The replacement argument is set to False by default, so basically this line of code will be executed as you’ve described.


(Sebastian Raschka) #4

Nice, I didn’t now that this existed!


(Pablo Rr100) #5

Scenario:
The question comes from the fact that I am training a ResNet on CIFAR10 and the training loss reaches 0% at around epoch 100 and I still don’t match enough accuracy on the test set.

I don’t think I am overfitting since the validation loss keeps reducing as well. So I didn’t go for increasing the weight decay.

Question:
Any idea on how could I manage to kind of keep room for improvement and not reach the 100% accuracy while training so fast? @rasbt @ptrblck

I thought in increasing the shuffling but given your answer this is not an option any more :sweat_smile:

Thank you!


(Sebastian Raschka) #6

one of the many things to try is data augmentation (some random rotation & translation, for example)


(Pablo Rr100) #7

I am following the ResNet paper settings for CIFAR10.
I think I give way more insights of what could be happening in this new question after I looked at my validation loss.

What is the meaning of the shape?
Could that be the problem why I can’t reach more than 92% of testing accuracy?