I have a large data (10GB), each data sample is about 100MB, I want to know whether it is a wise method to preload all data
I tried to load everything first and store them as a property of Dataset, and the getitem function only slice a piece of it and do data augmentation.
Since the data augmentation is time-costing, I used multi cores in data_loader, but I found that it becomes very slow.
I wonder whether do those sub-processes share the Dataset memory? Or the dataset was copied several times to different sub-processes?
Thank you