Dataloader num_workers relate to gpu memory?

dlfma_tjd · November 7, 2021, 6:50am

If I increase num_workers from 0 to 4, then could be cuda out of memory problem solved? RAM is enough but memory is exploded…help me please

ptrblck · November 8, 2021, 7:49am

No, increasing num_workers in the DataLoader would use multiprocessing to load the data from the Dataset and would not avoid an out of memory on the GPU.
To solve the latter you would have to reduce the memory usage by e.g. reducing the batch size or by using e.g. torch.utils.checkpoint.

pineapple_pine · January 5, 2022, 6:58pm

Hi, ptrblck! Do you know why increasing num_workers to 2, 4 or 8 wouldn’t decrease the runtime?

ptrblck · January 5, 2022, 7:20pm

You would have to profile your code and check how long the data loading takes. E.g. your overall data loading might be faster than the model training iteration and thus the time to load each batch could already be hidden. In such a case speeding up the data loading would of course not yield any performance improvement. On the other hand you could see a data loading bottleneck, but your system cannot speed it up further e.g. due to the limited read speeds of your SSD etc.
Generally, I would recommend to profile the code to see where the bottlenecks are and then try to optimize it.

EDIT: also explained in this answer from your double post.

pineapple_pine · January 5, 2022, 7:29pm

Thank you so much!! I will try your suggestion.

pineapple_pine · January 5, 2022, 9:08pm

May I ask how to test the batch loading speed & the model training iteration speed?

Also, if I set num_workers = multiprocessing.cpu_count() which maximize the usage of cpu, and still does not improve runtime, does that mean there’s no way to improve runtime?

New to Pytorch for these silly question lol

pineapple_pine · January 5, 2022, 10:01pm

I’ve figured it out. Actually setting num_workers = 0 will speed up the code because the loading time itself is short, but creating new threads actually takes much longer. Therefore, with more num_workers & even using multiprocessing.cpu_count() will slow down the runs.