DataLoader, when num_worker >0, there is bug

dreamyun · September 22, 2018, 2:14pm

In stackoverflow, there is a answer to this error:

I encountered the very same issue, and after spending a day trying to marry PyTorch DataParallel loader wrapper with HDF5 via h5py, I discovered that it is crucial to open h5py.File inside the new process, rather than having it opened in the main process and hope it gets inherited by the underlying multiprocessing implementation.
Since PyTorch seems to adopt lazy way of initializing workers, this means that the actual file opening has to happen inside of the getitem function of the Dataset wrapper.

So, I modified my code according to the answer and run well.

def __init__(self,hdf5file,imgs_key='images',labels_key='labels',
             transform=None):
   
    self.hdf5file=hdf5file

   
    self.imgs_key=imgs_key
    self.labels_key=labels_key
    self.transform=transform
def __len__(self):

    # return len(self.db[self.labels_key])
    with h5py.File(self.hdf5file, 'r') as db:
        lens=len(db[self.labels_key])
    return lens
def __getitem__(self, idx):


    with h5py.File(self.hdf5file,'r') as db:
        image=db[self.imgs_key][idx]
        label=db[self.labels_key][idx]
    sample={'images':image,'labels':label}
    if self.transform:
        sample=self.transform(sample)
    return samplemple

the with sentence are the main modification. In original code, we do not close the file object returned by the h5py.File method. After modification, the file object is closed in the __len__ and __getitem__ method. So, I want to know the close of file object returned by the h5py.File is necessary?