Guidelines for assigning num_workers to DataLoader

I stored the image with .pt format, I do not know too much the format…

Actually, I tried num_workers=0, it is faster to load each image than num_workers=8. However, it each time, the CPU will load less batches, e.g, 2 batches to train, then it wait several minutes to load another two batches, etc… For the case of num_workers=8, it will one time load 12 batches, but for each image, it will take longer time, i guess it is because of the subprocess of CPU. After forward and backward the image into the NN, it will wait several minutes to load another 12 batches data.

The number of batches here is just an example, but generally is in my case. I found that to set the num_workers properly really depends on the num_cpu, num_gpu and batch_size of the workstation…
If I understand it correctly, If I have 8 CPUs and used 1 GPU for my NN, I set the batch_size to be 8, so basically, each CPU will take care of one image, right? Based on the GPU memory saving for data, it will transfer the batches of data onto GPU to train, until the training on GPU ends, it will gather new batches of data?

Finally, I set num_workers=8 and batch_size=4 to train my auto-encoder. The speed is acceptable.