I’m looking for suggestions on how to speed up process times when storing/accessing large amounts of data for training. Presently, I am using Pandas DataFrames saved to csv. Then I load just the part of the csv I need at any given time in the CustomDataset.
But this has been quite slow, in my opinion, as the csv file is over 300mb(might be bigger later, too). Loading the entire file and accessing the parts I need via .iloc is even slower. So I wanted to check what you all use. I see there are a few options:
- Pandas DataFrames
- Numpy Arrays
- Tensors saved to .pt files
What have you found to work best for performance?