How to load dataset with different number of data?

Hi, I want to control the total number of training data, e.g., 50,000 -> (uniformly selected) 5,000 on MNIST.

I think that it can be controlled if I use torchvision.datasets.ImageFolder.

Do I have to use torchvision.datasets.ImageFolder for that purpose? or Is there any other method to control the number of data?

Another option is to code the __len__() method of your Dataset class to return 5000. Then in the class you can have a map from the index received by __getitem__() to an index in the 50,000 data. For example, using uniform random sampling.

1 Like

It is working. Thanks!