CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
On-line CPU(s) list: 0-5
Thread(s) per core: 1
Core(s) per socket: 6
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
CPU MHz: 2596.995
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-5
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt md_clear
is cuz the # of cpus 6 I should set
num_workers=6? is it the number of cpus?
It usually refers to amunt of threads. That’s usually equivalen to 2* n_cpus.
Anyway you can do the following
import multiprocessing as mp
max_cpus = mp.cpu_count()
so you set number of workers to 2 times the cpus usually? i.e.
num_workers = max_cpus * 2
Cpu_count returns amount of threads which is usually equal to 2 times amount of cpus. Anyway if you set a number bigger than the real the effect is as is set the max.
What do u mean bigger than the real? Are you saying that the code we should always have is:
num_workers = max_cpus
I suggest you to follow this thread; it is active for months.
Yeah I’ve since changed tune about convergence. Agree. That’s what I got when batch size was the same, so parallelisation was NOT working properly. Overheads of parallelisation don’t pay off without increasing batch size. Right?
oh wow. Thanks.
Though it seems there isn’t an agreement…and it also seems it depends on the # of gpus! What a nightmare. Is there any heuristic/rough number that always makes things better but doesn’t overload?
I think at this point I don’t care about being optimal, just making it run faster than in the main thread without running the risk of overdoing it.
Yes, this kind of situations never have an exact answer. Actually, when you follow the thread, you can see that everyone has got a different result using same configuration and even some of them has got error!
So the best approach is to make sure your model is ok defualt value which is 0, then if you have resource or time, you can play with different configurations to achieve your best. I have same problem too.
My hunch is that at the very least one can put
num_workers = 2 and always get a benefit. As long as there is at least a single GPU and 2 real CPUs.
That is the sort of minimal advice I was hoping to hear. I know this is not for sure but I trust its probably right.