ImageFolder not getting the complete dataset

ayrts · July 11, 2020, 8:19am

Hello,

When I use torchvision.datasets.ImageFolder on a large dataset (~120 000 images) with two folders (fake & real), the size of the total dataset pytorch uses is equal to the size of the real folder (exactly 68 850).

Since I output a csv file after training, I found that the number of fake images is exactly 30 000 (and no. of real images is 38 850). So ImageFolder only uses a subset of the actual training dataset. Anyone has similar experiences, or advice to give to debug this? I’m using len(ImageFolder).

ptrblck · July 12, 2020, 10:04am

Could you check the file extensions of all files and make sure that they are using the supported formats?

Also, could you post the folder structure here, please?

ayrts · July 13, 2020, 7:25am

Hi Patrick,

I checked the file extensions, and it seems to have been modified when it was moved (I was using cp --backup=numbered, so some .png files changed to .png.~1~). I only had .png files that corresponded exactly to what ImageFolder was able to read.

This was a problem on my end.

Thanks for the help.