Here is my question:
I have roughly 400,000 training data and each one is stored as a csv (~35 GB in total). I have a custom dataset object that reads these csv files in
__getitem__. Currently, each epoch takes roughly 70 minutes with a batch size of 512.
So, I was wondering if there’s anyway to speed up the training without adding additional resources?