How to collect whole batch at once?

malfonsoarquimea · November 24, 2022, 1:34pm

Hi!
I am now working on a problem where I must read my data from a table-like h5 file
The problem is that, given how I am providing the sampling (with a batch_sampler) and how the h5 file is structured, the time to read one item is almost the same as the time to read the whole batch (25k items).
I would like to know if there is some way to tell the dataset and dataloader to load one whole batch at a time to avoid reading multiple time from the hdf5

ptrblck · November 24, 2022, 6:41pm

You could use a BatchSampler and pass all indices directly to Dataset.__getitem__ which would allow you to load multiple samples. This code shares an example.

malfonsoarquimea · December 2, 2022, 11:28am

I ended up doing exactly that, thanks!