I struggled with the same issue recently and managed to overcome the problem by moving the hdf5 opening code into Dataset.__getitem__(x)
method of my custom class. It works and, according to my experiments (I compared data loading speed with the setup where I have multiple small hdf5 files, representing the objects from the big file), it works slightly faster. More important is the fact that it finally works with num_workers > 1
in the DataLoader.
Have you found any other solution? Or maybe considered other options for data storage and access for the case of very large datasets?