Error while Multiprocessing in Dataloader

FWIW - if using pytorch-lightning the suggested import solution did not help. In my case, a custom data set was generating ALL data on the fly. Altering it to pregenerate substantial data caused the issue to go away, even with non-zero workers.

Hi sneiman, could you please add more details about “generating ALL data on the fly” and “pregenerate substantial data”? I’m also using pytorch-lightning and this is really a trouble.

My initial synthetic data set generated the data on demand – meaning the data set did not have any data ‘pre cached’. Like a generator in python. The data was not created until the *getItem() call.* This had the problem, unless I set the number of workers to 0.

I guessed that perhaps not having data available was confusing the splitting up of the data job among a number of workers. So, I changed the dataset to create all of the data when the constructor was called. This solved the problem – I usually run this with 8 workers now.

Hope this helps.

sneiman

Thanks for your reply! This is curious since as far as I know the dataloader workers will prefetch batches automatically. I could not verify this as I have a lot of videos that will not fit into memory. I’ll try to dig deeper. Thank you again.

I thought it was odd as well – I had imagined the data loaders simply make lots of calls on getitem(). However, PTL does do a lot of its own multiprocessing management, particularly with DDP.

Good luck,

s