In a case where my data fits in memory as numpy array, I realize that batching the data using getitem from the Dataset interface is much slower than when it is indexed manually with numpy.
I am sure it’s because the Dataloader build batches sample by sample by calling getitem to fetch each sample.
Is there any workaround to build batches faster while still using the standard Dataset / Dataloader interface ?
I can for instance load all the data in memory in the __init__, and then access each row of the dataset in the __getitem__, but because getitem fetch each row one by one, it is definitely slower than fetching with a slice in numpy like data[0:batch_size]