FWIW - if using pytorch-lightning the suggested import solution did not help. In my case, a custom data set was generating ALL data on the fly. Altering it to pregenerate substantial data caused the issue to go away, even with non-zero workers.
Hi sneiman, could you please add more details about “generating ALL data on the fly” and “pregenerate substantial data”? I’m also using pytorch-lightning and this is really a trouble.
My initial synthetic data set generated the data on demand – meaning the data set did not have any data ‘pre cached’. Like a generator in python. The data was not created until the *getItem() call.* This had the problem, unless I set the number of workers to 0.
I guessed that perhaps not having data available was confusing the splitting up of the data job among a number of workers. So, I changed the dataset to create all of the data when the constructor was called. This solved the problem – I usually run this with 8 workers now.
Hope this helps.
Thanks for your reply! This is curious since as far as I know the dataloader workers will prefetch batches automatically. I could not verify this as I have a lot of videos that will not fit into memory. I’ll try to dig deeper. Thank you again.
I thought it was odd as well – I had imagined the data loaders simply make lots of calls on getitem(). However, PTL does do a lot of its own multiprocessing management, particularly with DDP.