How to set OMP_NUM_THREADS for distruted training?

I got a warning but there was no link or suggestions of how to tune this number (or what it means). How do I choose this value?

Error:

(meta_learning_a100) [miranda9@hal-dgx diversity-for-predictive-success-of-meta-learning]$ python -m torch.distributed.launch --nproc_per_node=2 ~/ultimate-utils/tutorials_for_myself/my_l2l/dist_maml_l2l_from_seba.py

/home/miranda9/miniconda3/envs/meta_learning_a100/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torch.distributed.run.
Note that --use_env is set by default in torch.distributed.run.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
WARNING:torch.distributed.run:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
7 Likes

Since it’s an environment variable, I think you can simply set its value by:
OMP_NUM_THREADS=$VALUE python -m torch.distributed.launch --nproc_per_node=2 xxxxx
This is similar to other environment variables e.g. CUDA_VISIBLE_DEVICES

1 Like

oh cool. But I was curious about how does one choose the value of OMP_NUM_THREADS=$VALUE

6 Likes

bro,did you have slove this problem?

It seems that an optimal value can be found by OMP_NUM_THREADS = nb_cpu_threads / nproc_per_node. Use htop to know about number of CPU threads

2 Likes

set omp threads to number of logical cpus

import os

logical_cpus = os.process_cpu_count()

print(logical_cpus)

os.environ[‘OMP_NUM_THREADS’] = str(logical_cpus)

OMP_NUM_THREADS should be set to #PhysicalCores / #Processes.

  • os.process_cpu_count, as @handaru mentioned and was added in Python 3.13, as well as nproc and htop, gives the number of logical CPUs. Please use psutil.cpu_count(logical=False) to get the number of physical CPUs.
  • Don’t forget to divide it by nproc_per_node as @Kenzo mentioned, or you’ll have many processes competing for each core.