Incorrect data using h5py with DataLoader?

mattrobin · September 5, 2017, 9:38pm

When using an HDF5 database through h5py, I’m running into an issue where the numbers being pulled by the DataLoader come out incorrect. The h5py File object is only opened for reading (opened in the __init__ of the object which subclasses torch.utils.data.Dataset). The file is not open for writing by any process. Then in the __getitem__ definition, the h5py dataset is just indexed normally. Occasionally the number is nan or an extremely small number (which should be a float number in some known range). I tested indexing twice in a row, and one of the index calls ends up with the correct number, so it’s a sporactic occurrence. And it only occurs when running the process with multiple readers. A single reader does not have the problem. I was under the impression that h5py could handle multiple readers without problem as long as no writers were at work, but this seems not to be the case. Has anyone encountered this problem and discovered a good way to resolve it? Thank you!

smth · September 30, 2017, 9:50pm

I believe it’s a known issue with HDF5 / h5py that it doesn’t play well with multiprocessing.
https://groups.google.com/forum/#!topic/h5py/bJVtWdFtZQM

What you can do is (python 3 only) at the top of your main script, add the lines:

import torch
import torch.multiprocessing
torch.multiprocessing.set_start_method('spawn')