Guidelines for assigning num_workers to DataLoader

Well, this appears to me as closely being related to why DataLoaders are there in the first place. Let me elaborate-
Since you set num_workers = 1, there’s only 1 parallel process running to generate the data, that might’ve caused your GPU to sit idle for the period your data gets available by that parallel process. And because of it, the GPU (CUDA) would’ve run out of memory.
As you increase the no. of workers (processes), more processes are now running in parallel to fetch data in batches essentially using CPU’s multiple cores to generate the data = data is available more readily than before = GPU doesn’t have to sit idle and wait for batches of data to get loaded = No CUDA out of memory errors.