Yea, I’ve explored topic a bit and what I found is:
- With version 1.8 of HDF5 library working with HDF5 files and multiprocessing is a lot messier (not h5py! I mean HDF5 library installed on your system: https://unix.stackexchange.com/questions/287974/how-to-check-if-hdf5-is-installed). I highly recommend to update the library to 1.10 version where multiprocessing works better. I was only able to get h5py to work with “with” statement and this seems to give huge overhead, but I didn’t have time to investigate it properly:
class H5Dataset(Dataset):
def __init__(self, h5_path):
self.h5_path = h5_path
def __getitem__(self, index):
with h5py.File(self.h5_path, 'r') as file:
# Do something with file and return data
def __len__(self):
with h5py.File(self.h5_path,'r') as file:
return len(file["dataset"])
- In version 1.10 of HDF5 library I was able to create
h5py.File
once in__getitem__
and reuse it without errors.
class H5Dataset(Dataset):
def __init__(self, h5_path):
self.h5_path = h5_path
self.file = None
def __getitem__(self, index):
if self.file is None:
self.file = h5py.File(self.h5_path, 'r')
# Do something with file and return data
def __len__(self):
with h5py.File(self.h5_path,'r') as file:
return len(file["dataset"])