How to have a custom test train split in MNIST (torchvision dataset)?

(newbie in pytorch here)
In the torchvision dataset - MNIST, dataloader has 60000 images as training set and 10000 images as the test set. My requirement is to have around 20000 images as training set and the rest as test set. Is this possible to do?

if you want (not recomanded, leave testset alone) you can first merge them to one set with:

mergeDataSet = torch.utils.data.ConcatDataset([trainSet,testSet])

and then you can split them to val and train set again with:

newTrainSet , valSet = torch.utils.data.random_split(mergeDataSet, [20000, 50000], generator=torch.Generator().manual_seed(42))

I have not tested the code. but it should work.

1 Like

This works perfectly!! Thank you so much!!