Hdf5 file into pytorch dataloader

Hi,
I have some hdf5 files which are splitted by X,Y and train/va/test (e.g. 1 file is train_X.h5, another file is train_y.h5, etc.)

I’m trying to load each of them into pytorch dataloader, but I feel that I need to somehow first unite the files (meaning - train should be 1 file) and then load them?

The problem is that I’m a bit newbiew :slight_smile: and don’t have experience with working with hdf5, so I would love to have some help with that.

I don’t have control over the hdf5 files and these are static, so I have to load them as it is.

What should I create for that? Which classes?

Thanks in advance,
Daniel

Hi,

I have solved it using a great post - DataLoader, when num_worker >0, there is bug - #16 by piojanu

Basically, since my HDF5 were different (each file, different X / Y files), I had to separately load them.

1 Like

Hi @Daniel_Hen!

What is the size of your .h5 data? By using the solution you shared, was the overhead reduced when you load the entire train dataset?

Thank you.