How to write a dataset with queue for parallelized pre-processing

ham1d · April 29, 2020, 8:38pm

Hi,

I would like to use a pytorch dataset tha does the pre-processing in parallel and fills a queue and stores in it a queue, while __getitem__ picks from the pre-computed items in the queue.

In my case, the workers in data loaders are not very optimal when the pre-processing (which is done using an external module) is done in the dataset and it can get very slow. Because of that, my gpu is not fully used and it is mostly free.

Can anyone help?