Efficient Data api

Hello. Dataloader can be extremely important for training speed.
I was looking for a efficient data api and i have seen tensorflow data api that can load data into memory in different ways.

There isn’t any ready to use project for Pytorch.

I’m not sure to understand the claim.
Are the Dataset and DataLoader utilities not working for your use case or which project is missing?

Pytorch dataloader is working well. I am talking about more efficient dataloader. Dataloader is a bottleneck for training.
(new)tensorflow data api can load data while optimizasyon process going on. Dataloading process doesn’t depend on training process. It look like asynchronously streaming.

The DataLoader implementation uses multiprocessing (specified by num_workers) to load the next batches in the background, while the training is executed, so the functionality should be similar.

However, if you have other suggestions, please add them to this feature request.