[SOLVED] About DataLoader pre-processing

oasjd7 · October 4, 2018, 2:38pm

Hi, all.

I have a question about the DataLoader.

Everytime when I train the network, I always load data and pre-process with DataLoader.
But I think this is inefficient because I have to spend my time to pre-processing the data.
How can I make the pre-processed data to load the data except pre-processing.

justusschock · October 4, 2018, 10:11pm

You could preprocess the data once (using the current dataset) and save the preprocessed data as tensor (using torch.save). You could then load your already preprocessed data using torch.load by writing a new dataset which deals with the preprocessed data. If you have enough RAM to contain your whole Dataset you could do it in your new dataset’s __init__. If that’s not the case, just do it in your new dataset’s __getitem__.

oasjd7 · October 5, 2018, 1:15am

Thanks!
I’ll try to use torch.save!