ImageFolder not getting the complete dataset

Hello,

When I use torchvision.datasets.ImageFolder on a large dataset (~120 000 images) with two folders (fake & real), the size of the total dataset pytorch uses is equal to the size of the real folder (exactly 68 850).

Since I output a csv file after training, I found that the number of fake images is exactly 30 000 (and no. of real images is 38 850). So ImageFolder only uses a subset of the actual training dataset. Anyone has similar experiences, or advice to give to debug this? I’m using len(ImageFolder).

Could you check the file extensions of all files and make sure that they are using the supported formats?

Also, could you post the folder structure here, please?

Hi Patrick,

I checked the file extensions, and it seems to have been modified when it was moved (I was using cp --backup=numbered, so some .png files changed to .png.~1~). I only had .png files that corresponded exactly to what ImageFolder was able to read.

This was a problem on my end.

Thanks for the help.