Multiprocessing dataloader with num_workers = 1

amirhf · December 9, 2022, 5:30pm

Hello, I have custom map-style dataset created and I iterate with the native PyTorch Dataloader. I use num_workers = 1 for debugging purposes and understanding the bottlenecks with pytorch profiler. Here is what I see in PyTorch Profiler’s trace tab:

I’m confused why there are 7 different threads still created that wait for other thread to finish in each iteration of dataloader. Can someone help me understand and guide me how to definitely get a single process dataloader?

ejguan · December 12, 2022, 7:18pm

To get a single-process dataloader, you should specify num_wokers=0.

amirhf · December 12, 2022, 8:24pm

I wonder why. It doesn’t sound intuitive!

ejguan · December 27, 2022, 2:56pm

num_worker represents the number of subprocesses used by DataLoader. Using num_worker=0 means make everything running in the main process.