How to do data preprocessing for large amount of data?

Niraja · April 24, 2023, 4:01am

I’m working with a significant amount of image data. This is my first time working on something like this. I had never worked with such a vast amount of data before. So, how do you conduct data preparation on massive amounts of data?

ptrblck · April 24, 2023, 8:16am

The common approach would be to lazily load and process the data to keep the needed memory usage low. This would also mean that each transformation is applied on the loaded sample only and not during an offline preprocessing step.
The ImageNet example uses such an approach and might be a good reference.