HDF5 for Multi-GPU Training

Hi everyone,

I have a 64GB HDF5 file which is one 3D tensor with edges of length 2048. For each batch iteration (batch size = 16), I sample random 64-length-edged 3D tensors.

Due to HDF5’s inability to be read by multiple workers, I always use workers = 0 for my dataset class. I believe that this is not as efficient as it could be & prevents me doing Multi-GPU training. I tried using multiple workers when reading HDF5 files and the read data was corrupted as some of the rows were shifted (not sure, can be another row from another area).

Do the random sampling of 64-length-edged cubes before hand and save it as key-value pairs where keys are integers starting from 0 to N (N = sample size) and values are randomly sampled 64-length-edged cubes. Thus to read a batch, just iterate over the keys. Visualization:

  • [“0”] = randomly sampled tensor of size [64,64,64]
  • [“1”] = another randomly sampled tensor of size [64,64,64]
  • [“N”] = another randomly sampled tensor of size [64,64,64]

If I save this data structure as HDF5 again, the same problems will prevail and prevent me from using multiple workers in the dataloader or multi-gpu training. How should I save this data so that it enables me to use multiple workers (to increase batch iteration speed) and multi-gpu training?

Any help/recommendations are deeply appreciated!