When I imported the data in dataset.ImageFolder, I tried to print the length of the data, the length is supposed to be 54000 image but when I run print(len(data)) it outputs different number and when I run it again it output another totally different number, does anyone know why it doesn’t shows the exact length of the dataset? does it have limit for numbers of images??
Could you post the code you are using to create your Dataset
?
Also some information about the folder structure would be interesting to see.
import torchvision.transforms as transforms
from torchvision import datasets
transforms = transforms.Compose([transforms.RandomRotation(30),
transforms.Resize(30),
transforms.ToTensor(),
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
Arabic_train_data = datasets.ImageFolder(Arabic_train_dataPath, transform = transforms)
Arabic_test_data = datasets.ImageFolder(Arabic_test_dataPath, transform = transforms)
here is the code I’ve used, the data path has 10 folders of arabic handwritten numbers each folder contain 6000 bmp image file
every time I run
print(len(Arabic_train_data))
it outputs wrong length of the dataset and it gets increasing everytime I run it
although the test data length is 10000 as it should be
So if you run these command sequentially, you’ll get different results?
Arabic_train_data = datasets.ImageFolder(Arabic_train_dataPath, transform = transforms)
print(len(Arabic_train_data))
print(len(Arabic_train_data))
print(len(Arabic_train_data))
print(len(Arabic_train_data))
at the same time no, but if there were a delay maybe about 2 mins the value increases, as if it keeps loading the data in the ImageFolder by the time
and another weird thing, the length now exceeded the number of images in the dataset, it reached 56k and it supposed to be 54k
Are you working on a shared drive or are you moving data around?
This seems really weird, as the samples and targets are created in the __init__
call.
So after initializing the ImageFolder
, the length should be constant even if you add new files to the folders.
Or are you reinitializing the dataset also?
I am working on google colab and connecting it to my google drive account
Still the lendth shouldn’t be changed once the dataset is initialized, so are you reinitializing it somewhere in your code?