Pytorch dataloader for a dataset with a lot of images


I have a very big dataset with a lot of images. Is there any way faster than loading single images in getitem method? I want to increase the batch size and want to find the bottleneck. the batch size, size of my network, …

Thank you

The loading of images onto RAM and GPU can be speed up through custom loaders but speeding the training through larger batch sizes is something different.

if you just want to speed up the load of images, have a look here on copying the data to GPU in parallel.

1 Like

So using this way I can increase the training speed right? My final goal is to train the network faster. I added a second RTX 2080Ti and use nn.DataParallel and now it’s almost twice faster, but I want it to be faster.

Maybe you can use hdf5 or lmdb format.

1 Like

The tools come from NVIDIA---->DALI . But I never use it :rofl:

1 Like

yes, it should make the training faster by loading the data faster. You might can do other optimizations to make faster as well, e.g. batch size, network size etc.

1 Like