When I use torchvision.datasets.ImageFolder on a large dataset (~120 000 images) with two folders (fake & real), the size of the total dataset pytorch uses is equal to the size of the real folder (exactly 68 850).
Since I output a csv file after training, I found that the number of fake images is exactly 30 000 (and no. of real images is 38 850). So ImageFolder only uses a subset of the actual training dataset. Anyone has similar experiences, or advice to give to debug this? I’m using len(ImageFolder).
I checked the file extensions, and it seems to have been modified when it was moved (I was using cp --backup=numbered, so some .png files changed to .png.~1~). I only had .png files that corresponded exactly to what ImageFolder was able to read.