DataLoader returns all same labels when shuffle is set to false

When I use PyTorch DataLoader to load my test data, I set the shuffle to False like this.
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)
It returns labels like this:
Screen Shot 2020-10-14 at 11 12 41
The labels are all 1 which is impossible because my data includes data from different classes(thus, different labels).

When I changed the shuffle argument of DataLoader to shuffle=True,
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=True)

the labels are finally returned normally.

Labels:  tensor([11,  0, 12,  5, 13, 16, 12, 16, 18,  7, 10, 10,  8, 18, 14, 16, 14,  3,
        15,  6,  0, 10,  6, 10,  0, 18, 14,  0,  7, 10,  5, 15, 11,  7,  0,  9,
        11, 13,  8, 11,  6, 16, 10,  8, 10, 18,  9,  4,  7, 10,  5, 18,  3, 12,
         5,  9,  8,  6, 15,  3, 14, 12, 17, 14])

Labels:  tensor([14, 14,  8, 12, 15,  7,  6, 14,  8,  9, 17, 12, 16,  0, 17,  1,  7,  2,
        16, 14, 10, 15,  7,  8, 14, 16,  4, 17,  9, 15,  6,  6,  6, 18,  5,  0,
         8, 10,  2,  0,  8,  6,  5, 17, 16, 18, 10,  9, 11,  7,  7, 10, 18,  7,
         4,  7,  9,  4, 18,  6, 18,  6,  5, 10])

The problem is solved but I don’t understand why dataloader returns all same labels when shuffle is set to False. Can anyone explain this to me?

Maybe it is the reason:
When the folder contains images whose names are with specific meaningful prefixes, then the dataloader load the images sorted alphabetically.

Take two classes dog and cat as an example, given cat label 1 and dog label 0. Suppose 1000 images are dog_001.jpg, dog_002.jpg, etc, 200 images are cat_001.jpg, … Then you get labels with all 1s.
Because images with the prefix of cat should be returned first by alphabet. So this explains your case.

Sorry, I don’t understand your explanation. Can you elaborate more on that?
(I can read Japanese, so you can just reply in Japanese too)

Isn’t this the expected behavior when setting shuffle=False?
shuffle=False means, now the data is no longer shuffled but in-order.

The labels before tensor you printed above is probably only your first batch and thus only contains label 1. If you were to print further batches they would probably contain your other labels.

I am guessing your test_set is a torchvision.datasets.ImageFolder() with your images being inside different folders representing your labels?
If so than this is indeed the expected behavior.

Thank you for your explanation!

1 Like