Dataloading with shuffle=True very slow

I have a dataset with multiple .h5 files, I have seen all the pipeline tricks that should be used, i.e. open files in get_item and keep them open etc.
Each .h5 file corresponds to one speaker, and I have made a custom sampler that shuffles the indices so that in each batch there are samples from different speakers.
However, when I train with this sampler (note that the data are loaded from an HDD) the training is very slow, but when I remove the ‘sampler’ argument from the dataloader, so it performs essentially sequential sampling, the training is fast.
Does this have to do with the data that are stored in consecutive positions in memory? I load each sample separately so the memory access is different in each sample. Why when doing shuffling the training goes slow?

Yes, this is expected behavior using an HDD as random accesses are more expensive due to its nature of using a spinning magnetic disk.

I tried loading the same data when copied on an SSD and I noticed the speed is about the same as when I loaded all of the data in the RAM.
Does this mean, that the dataloading pipeline has reached its full speed and then the final training speed depends only on the model?

This could be the case, in particular if you are using multiple workers which preload the next batches in the background while the model is being trained. You could check the data loading time as seen in the ImageNet example, which should converge towards 0 if the workers are fast enough to preload the next batches in time.

Thanks I did not know I can check this, I will test it!