Lmdb or H5py for data loading in python

What is the current best practice for loading large image dataset (500GB) into pytorch?
I have tried a lmdb way by using this repo and the loading time improved as compared to the ImageFolder+DataLoader pair.

Given that hard disc space and multiprocessing are factors in consideration.

Thanks.

1 Like