Fastest way of loading very small datasets


I was wondering what would be the fastest way to work with very small datasets like MNIST or CIFAR10 with PyTorch.
Is it loading the dataset into the RAM? If so, how can I do it?

At the time of defining the dataset class you can read data there so that it gets loaded in RAM (instead of reading from hard disk)
You can also directly load them into the GPU (using the main thread) and preprocessing directly in the gpu.
The drawback is image preprocessing is PIL based, so you would have to reimplement them as pytorch functions.

1 Like

Is there a forum link/documentation on how to do that?

Pytorch’s tutorial is extensive enough. You can push in all of the heavy lifting in the Dataset’s getitem attribute including loading using PIL and conversion to torch.

1 Like