DataLoader returns all same labels when shuffle is set to false

SarahTeoh · October 16, 2020, 7:12am

When I use PyTorch DataLoader to load my test data, I set the shuffle to False like this.
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)
It returns labels like this:

The labels are all 1 which is impossible because my data includes data from different classes(thus, different labels).

When I changed the shuffle argument of DataLoader to shuffle=True,
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=True)

the labels are finally returned normally.

Labels:  tensor([11,  0, 12,  5, 13, 16, 12, 16, 18,  7, 10, 10,  8, 18, 14, 16, 14,  3,
        15,  6,  0, 10,  6, 10,  0, 18, 14,  0,  7, 10,  5, 15, 11,  7,  0,  9,
        11, 13,  8, 11,  6, 16, 10,  8, 10, 18,  9,  4,  7, 10,  5, 18,  3, 12,
         5,  9,  8,  6, 15,  3, 14, 12, 17, 14])

Labels:  tensor([14, 14,  8, 12, 15,  7,  6, 14,  8,  9, 17, 12, 16,  0, 17,  1,  7,  2,
        16, 14, 10, 15,  7,  8, 14, 16,  4, 17,  9, 15,  6,  6,  6, 18,  5,  0,
         8, 10,  2,  0,  8,  6,  5, 17, 16, 18, 10,  9, 11,  7,  7, 10, 18,  7,
         4,  7,  9,  4, 18,  6, 18,  6,  5, 10])

The problem is solved but I don’t understand why dataloader returns all same labels when shuffle is set to False. Can anyone explain this to me?

Naruto-Sasuke · October 16, 2020, 9:44am

Maybe it is the reason:
When the folder contains images whose names are with specific meaningful prefixes, then the dataloader load the images sorted alphabetically.

Take two classes dog and cat as an example, given cat label 1 and dog label 0. Suppose 1000 images are dog_001.jpg, dog_002.jpg, etc, 200 images are cat_001.jpg, … Then you get labels with all 1s.
Because images with the prefix of cat should be returned first by alphabet. So this explains your case.

SarahTeoh · October 20, 2020, 6:36am

Sorry, I don’t understand your explanation. Can you elaborate more on that?
(I can read Japanese, so you can just reply in Japanese too)

RaLo4 · October 20, 2020, 2:48pm

Isn’t this the expected behavior when setting shuffle=False?
shuffle=False means, now the data is no longer shuffled but in-order.

The labels before tensor you printed above is probably only your first batch and thus only contains label 1. If you were to print further batches they would probably contain your other labels.

I am guessing your test_set is a torchvision.datasets.ImageFolder() with your images being inside different folders representing your labels?
If so than this is indeed the expected behavior.

SarahTeoh · October 21, 2020, 12:29am

Thank you for your explanation!