Error while Multiprocessing in Dataloader

FWIW - if using pytorch-lightning the suggested import solution did not help. In my case, a custom data set was generating ALL data on the fly. Altering it to pregenerate substantial data caused the issue to go away, even with non-zero workers.

Hi sneiman, could you please add more details about “generating ALL data on the fly” and “pregenerate substantial data”? I’m also using pytorch-lightning and this is really a trouble.

My initial synthetic data set generated the data on demand – meaning the data set did not have any data ‘pre cached’. Like a generator in python. The data was not created until the *getItem() call.* This had the problem, unless I set the number of workers to 0.

I guessed that perhaps not having data available was confusing the splitting up of the data job among a number of workers. So, I changed the dataset to create all of the data when the constructor was called. This solved the problem – I usually run this with 8 workers now.

Hope this helps.

sneiman

Thanks for your reply! This is curious since as far as I know the dataloader workers will prefetch batches automatically. I could not verify this as I have a lot of videos that will not fit into memory. I’ll try to dig deeper. Thank you again.

I thought it was odd as well – I had imagined the data loaders simply make lots of calls on getitem(). However, PTL does do a lot of its own multiprocessing management, particularly with DDP.

Good luck,

s

In the dataloader, “persistent_workers= True” works for me.

3 Likes

Thank you bro, it’s worked!

This worked for me, too. Thank you!

I was not using any tqdm and having a very weird crash where it was running the code completely unrelated to the call. BUT just because I imported like the suggestion solved my crash as well.
Howww?

This also fixed my problem, already had the from tqdm import tqdm and that did not solve it but this did. Running in a docker container

This worked for me. Thank you very much~