I’m new to this forum, and also quite new to Pytorch.
I’m running into an issue that Data loader seems to be quite slow and i’m not sure what the bottle neck is. running on a ryzen 5700, 32 GB mem and a 4700 Super.
I’ve played around with that a bit, I get a huge increase in performance when I set num_workers to 0, reduces the Loading time to about 0.013!
increasing it to 1, I get 2.8 sec. Increasing further to 22 seconds at 8. Does this makes sense?
Increasing the num_workers lowers the loop time from 7.5 sec at 0 workers to 2 sec at 8 workers.
also the results for next epochs are usually much faster. i think dataloader or OS is Caching some information about the location of the file on the storage.
are you using a Windows machine??
I’m using a windows 11 machine with python 3.10.14 and Pytorch 2.2.2.
Your results look much more consistent with or without the data loader than mine.
Looking at my system resources I see very limited activity on PC, for 8 workers:
SSD: no notable percentage difference in use (1~2% of max)
CPU: 10%
Mem: 3GB increase (14.7 to 17.7 GB) in use.
What could be limiting my performance compared to yours?
This separate serialization means that you should take two steps to ensure you are compatible with Windows while using multi-process data loading:
Wrap most of you main script’s code within if __name__ == '__main__': block, to make sure it doesn’t run again (most likely generating error) when each worker process is launched. You can place your dataset and DataLoader instance creation logic here, as it doesn’t need to be re-executed in workers.
Make sure that any custom collate_fn, worker_init_fn or dataset code is declared as top level definitions, outside of the __main__ check. This ensures that they are available in worker processes. (this is needed since functions are pickled as references only, not bytecode.)
i think for this to work you can’t use jupyter notebook.
write .py script.
Tried defining the dataset outside main, but no effect unfortunately.
Reading the link you’ve provided, seems to be a long running issue on Windows that was never resolved. Guess that sticking at 0 num_workers (or running on Linux) is the best way forward.