HDF5 Multi Threaded Alternative

I struggled with the same issue recently and managed to overcome the problem by moving the hdf5 opening code into Dataset.__getitem__(x) method of my custom class. It works and, according to my experiments (I compared data loading speed with the setup where I have multiple small hdf5 files, representing the objects from the big file), it works slightly faster. More important is the fact that it finally works with num_workers > 1 in the DataLoader.

Have you found any other solution? Or maybe considered other options for data storage and access for the case of very large datasets?

4 Likes