What is a good heuristic for choosing the number of workers without running any pytorch/gpu script?

I wanted to choose the number of workers without running my script (or at least the initial guess since I’m not really allowed to debug in the gpu’s I have access). I good idea I think I had was to check number of cpus. Thus I ran ‘lscpu’ and got this:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0-79
Thread(s) per core:    2
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Stepping:              1
CPU MHz:               2621.695
BogoMIPS:              4392.23
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              51200K
NUMA node0 CPU(s):     0-19,40-59
NUMA node1 CPU(s):     20-39,60-79

with these specs what num workers would you use? By reading: https://unix.stackexchange.com/questions/218074/how-to-know-number-of-cores-of-a-system-in-linux

it seems that I have 2 real physicals cpus with 20 cores per socket…so perhaps 40 cores in total…then would that justify setting 20,40 or 2 number of workers?