How to write a dataset with queue for parallelized pre-processing

Hi,

I would like to use a pytorch dataset tha does the pre-processing in parallel and fills a queue and stores in it a queue, while __getitem__ picks from the pre-computed items in the queue.

In my case, the workers in data loaders are not very optimal when the pre-processing (which is done using an external module) is done in the dataset and it can get very slow. Because of that, my gpu is not fully used and it is mostly free.

Can anyone help?

1 Like