When setting num_workers in a dataloader what should I be looking at in my computer?

I did:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                6
On-line CPU(s) list:   0-5
Thread(s) per core:    1
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Stepping:              2
CPU MHz:               2596.995
BogoMIPS:              5193.99
Hypervisor vendor:     Microsoft
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-5
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt md_clear

is cuz the # of cpus 6 I should set num_workers=6? is it the number of cpus?

It usually refers to amunt of threads. That’s usually equivalen to 2* n_cpus.

Anyway you can do the following

import multiprocessing as mp
max_cpus = mp.cpu_count()
1 Like

so you set number of workers to 2 times the cpus usually? i.e.

num_workers = max_cpus * 2

Cpu_count returns amount of threads which is usually equal to 2 times amount of cpus. Anyway if you set a number bigger than the real the effect is as is set the max.

What do u mean bigger than the real? Are you saying that the code we should always have is:

num_workers = max_cpus

or less?


I suggest you to follow this thread; it is active for months.

oh wow. Thanks.

Though it seems there isn’t an agreement…and it also seems it depends on the # of gpus! What a nightmare. Is there any heuristic/rough number that always makes things better but doesn’t overload?

I think at this point I don’t care about being optimal, just making it run faster than in the main thread without running the risk of overdoing it.

Yes, this kind of situations never have an exact answer. Actually, when you follow the thread, you can see that everyone has got a different result using same configuration and even some of them has got error!

So the best approach is to make sure your model is ok defualt value which is 0, then if you have resource or time, you can play with different configurations to achieve your best. I have same problem too.

Good luck

Cool! Thanks!

My hunch is that at the very least one can put num_workers = 2 and always get a benefit. As long as there is at least a single GPU and 2 real CPUs.

That is the sort of minimal advice I was hoping to hear. I know this is not for sure but I trust its probably right.

1 Like