Flowers102 Train/Val/Test is 1020 / 1020 / 6149 samples instead of 6149 / 1020 / 1020?

dosssman · May 21, 2025, 12:16am

Greetings.

I have been playing with the Oxford Flowers102 dataset with some diffusion script.
The training was shorter than the code base I used as reference, and after checking the dataset length, it turned out that in torchvision (v3.6.0), the split is as follows:

Train: 1020 samples
val: 1020 samples
Test: 6149 samples

while in Hugginface’s dataset, it is
Train: 7169 samples
Test 1020 samples

I would have assume the train split takes most of the samples, so I was wondering if there was any specific reason for this data split ?

Thank you for your time.