In Pytorch, when analyzing the color histogram e.g. of the MNIST dataset using the pytorch data loader, it yields different results when shuffle is True/False. In fact, setting shuffle==True returns the correct result, whereas shuffle==False does not. Both versions (True/False) are supposed to return the set of data objects (i.e. the entire data set)
import torch
import torchvision
import torchvision.transforms as transforms
from tqdm.notebook import tqdm
transform = transforms.Compose([transforms.ToTensor()])
data_set = torchvision.datasets.MNIST(root="./", train=True, download=True, transform=transform)
dl_random = torch.utils.data.DataLoader(data_set, batch_size=1, shuffle=True)
dl_fixed = torch.utils.data.DataLoader(data_set, batch_size=1, shuffle=False)
bins = 255
hist2 = torch.zeros(bins)
for data, labels in tqdm(data_set):
hist2 += torch.histc(data, bins=bins, min=0, max=1)
for name, dl in [("fixed",dl_fixed), ("random",dl_random)]:
bins = 255
hist = torch.zeros(bins)
k = 0
for data, labels in tqdm(dl):
hist += torch.histc(data, bins=bins, min=0, max=1)
print(name, (hist-hist2).abs().sum())
Running the above codes prints e.g.:
>>> fixed tensor(0.)
>>> random tensor(40.)
Why does it not yield the same results (“tensor(0.)”)?