Loading data from dataloader requires too much time. I would like to have two processes running in parallel. One that load data into batches and put them into a shared queue and the other one that performs the training using GPU. In this way I could fully utilize the GPU without waiting for the loading of the data. Is it possible?
The torch.utils.data.Dataloader
class should do the trick.
https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
Following blog could be useful : https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel
When I iterate the dataloader:
'for idx, sample in enumerate(dataloader)
Training steps
’
The first line of code takes one minute to load the data. I would like to have loading of the data and training in parallel