How to set OMP_NUM_THREADS for distruted training?

OMP_NUM_THREADS should be set to #PhysicalCores / #Processes.

  • os.process_cpu_count, as @handaru mentioned and was added in Python 3.13, as well as nproc and htop, gives the number of logical CPUs. Please use psutil.cpu_count(logical=False) to get the number of physical CPUs.
  • Don’t forget to divide it by nproc_per_node as @Kenzo mentioned, or you’ll have many processes competing for each core.