Partition datasets.ImageFolder to have equal number of images per class

alet · August 17, 2020, 10:04pm

Hi,

I have Imagenet in a datasets.ImageFolder, I want to partition the training in two datasets, but using torch.utils.data.random_split results in literal random split so the number of images per class follows a binomial instead of all being the same (up to rounding). What’s the best way of achieving this uniform-number-per-class split?

I’ve seen this similar question but ImageNet is huge so the default answer won’t do. There’s an answer involving indices, but I was hoping there would be something simpler exploiting the nice structure of datasets.ImageFolder.

Thanks!