'shuffle' in dataloader

I noticed one strange thing that the loss value would be increased simply when I turn ‘shuffle’ off like below:
torch.utils.data.DataLoader(dataset_test, batch_size=batch_size, **shuffle=False**, num_workers=num_workers, drop_last=True) .
It’s about from 0.02 to 0.09. I didn’t change anything else but the ‘shuffle’. Anyone know what was going on? Thanks a lot!


If I understand correctly, you only turned shuffle off and experienced an increase in loss while training. Or you have trained a model already and when testing, turning shuffle off caused an increase in test loss?

Yes because shuffling allows your model to learn about all classes within a batch.

For example, if shuffle = False your first initial batches could only be composed of one class:

In this scenario your model is learning how to predict class 0 very well until it hits the first non-0 class. By then, the loss is massive.

When shuffle = True, your classes are randomly distributed:

So your model learns a little bit about everything. There’s no class it never seen before.

Yes, it happened during training. Actually, the test set here is validation set. Loss is computed over one batch.

Thank you for your reply.
But… sorry, I didn’t make myself clear enough before that I was doing a segmention job so the model would see all the classes anyway.

The reason @alx mentioned is the main reason and applies in your case too.
When you are providing your images to the model, if there is particular similarity between many of instances for instance, just providing indoor images/stree/etc, your model will learn less features to understand the differences even though all classes exist in all images(although I many segmentation tasks, this does not apply). So, at the test time/validation, your model did not learn enough about different type of images.

A more mathematical explanation would be the issues regarding updating gradients. For instance, providing an image of street then jungle, enable model to capture more general features rather than overfitting on particular type of color/construction/texture/… .

In the end, I think shuffling data during training is mandatory unless we care about order of data.


oh~ there’s something wrong with my expression. :hot_face: I was to consider training and validation both in Training or under ‘model.train()’. Just validation recorded no gradient by with torch.no_grad(): while training does. I just want to log the losses of training set and validation set by evety epoch.
[Can I do that? Put them all under model.train(), or I have to put validation under model.eval()? Would that be the reason?]
And yeah, I shuffled the training set to train the model. But I think there should be no loss difference whether I turn the shuffle of validation set on or off. After all, the net doesn’t learn about it if it doesn’t provide the gradient. :face_with_monocle:

Ow! In first post I asked about training loss or test loss, now I can get what you mean. Actually, after training for a while you are trying to validate your model so, this can be considered as testing. You have to use same configuration with testing which is using model.eval and torch.no_grad for looping over batches.

In this case, when you turn shuffle off, the batches are not identical to the case where shuffle is on for validation set. What may happen is that in ‘off’ case, the validation batch has simpler examples. To find out that your model is working properly or not, you just need to aggregate loss over all batches in case of shuffled and not shuffled.

1 Like

OH~ I tried to put validation under model.eval() and it worked! There was no difference between on/off. Thank you very much! :heart:
BTW, I also tried to compute the loss over one epoch in case of shuffled and not shuffled with validation not under model.eval(). The loss of off-mode was still higher than that of on-mode. Did that mean my model wasn’t working properly? What’s the principal of that?

1 Like

You are welcome!
I do not know definition of your model, but I get you are using BatchNorm and maybe Dropout. This layers act differently during eval and train mode. For instance, if model.train is used, then mean/variance in BatchNorm will be updated even on validation set, for every batch, you will have different parameters for all BatchNorm which is not desirable as we looking to update mean/variance in BatchNorm only in training step not testing or validation. That is, after training, in validaiton step, each time a different BatchNorm is being used which has been updated based on previous batch in validation step. So, overall loss will not be same.


Again, truly appreciate your timely help :pray: