What is pytorch recommended approach to ensure sequential reads while dataloading read

I’m running my training on imagenet data and using the generic data loader torchvision.datasets.ImageFolder for my use case. For higher batch sizes the data loader with transformation is taking substantial time to fetch every batch compared to the overall compute time. I have tried to improve performance by scaling the num_of_workers to max possible for my machine.

Definitely need a better way of preprocessing the data and storing it in a fashion to make more sequential reads. I have found many examples of creating the custom Dataset implementation to tackle this approach. I’m looking for what would be the standard recommended aproach to tackle this case?

This post and generally the thread could be helpful as also e.g. DALI is discussed there which might speed up your data loading and processing in case the CPU is “overworked”.