Efficient Data api

Selcuk_Caglar · July 8, 2020, 6:35pm

Hello. Dataloader can be extremely important for training speed.
I was looking for a efficient data api and i have seen tensorflow data api that can load data into memory in different ways.
https://www.tensorflow.org/guide/data_performance

There isn’t any ready to use project for Pytorch.

ptrblck · July 10, 2020, 8:48am

I’m not sure to understand the claim.
Are the Dataset and DataLoader utilities not working for your use case or which project is missing?

Selcuk_Caglar · July 16, 2020, 12:30pm

Pytorch dataloader is working well. I am talking about more efficient dataloader. Dataloader is a bottleneck for training.
(new)tensorflow data api can load data while optimizasyon process going on. Dataloading process doesn’t depend on training process. It look like asynchronously streaming.

ptrblck · July 16, 2020, 11:32pm

The DataLoader implementation uses multiprocessing (specified by num_workers) to load the next batches in the background, while the training is executed, so the functionality should be similar.

However, if you have other suggestions, please add them to this feature request.