Hi!
I am now working on a problem where I must read my data from a table-like h5 file
The problem is that, given how I am providing the sampling (with a batch_sampler) and how the h5 file is structured, the time to read one item is almost the same as the time to read the whole batch (25k items).
I would like to know if there is some way to tell the dataset and dataloader to load one whole batch at a time to avoid reading multiple time from the hdf5
You could use a BatchSampler
and pass all indices directly to Dataset.__getitem__
which would allow you to load multiple samples. This code shares an example.
I ended up doing exactly that, thanks!