Use single thread on Intel CPU

Hi,

I have been playing around with the C++ Frontend for PyTorch on my Laptop (Intel® Core™ i7-4600U) and were able to include PyTorch into my CPP app by following the MNIST example (https://github.com/goldsborough/examples/blob/cpp/cpp/mnist/mnist.cpp).
My app already utilizes parallelization to some degree, so I would like to run training / inference on a single thread. However, I was not able to tell PyTorch that - it would always use all cores. Here is what I did so far:

  • I could not find any “torch.set_num_threads(1)” so I searched the PyTorch sources for anything related and found at::set_num_threads(1);
  • Setting the openmp environment variable “OMP_NUM_THREADS=1”
  • Setting the MKL environment variable “MKL_NUM_THREADS=1”
  • Further investigation of the source code revealed, that caffee2 uses a ThreadPool which is initialized with cpuinfo_get_processors_count() from the cpuinfo lib (https://github.com/pytorch/cpuinfo). I did not find a way to set this from the outside

note, that I did not compile PyTorch myself, but used the library provided on the website. There seem to be a lot of different frameworks involved such as openmp,mkl,mkldnn etc. Thus I am a little confused on how to force PyTorch to use one thread. Any ideas?

Thanks!

3 Likes

@sbuschjaeger
Did you find any solution for this?
Now I see that there is torch::set_num_threads(1);, but it does not work.

When you try OMP_NUM_THREADS=1, does htop say that you are still using all CPU cores?

@yf225
Yes, I added omp_set_num_threads(1) (which has precedence over OMP_NUM_THREADS=1) in the beginning of the main function and still it uses all the CPU cores.
I also removed omp_set_num_threads(1) from the code, and entered OMP_NUM_THREADS=1 in the command line before running the mnist, and still it uses all of the CPU cores.

@afshin67 did you manage limit the thread to 1? having same problem with you, after calling omp_set_num_threads(1) or torch::set_num_threads(1), all cores still being occupied. my libtorch downloaded from PyTorch official website too.

My PC has 8 cores, results of running torch::get_num_threads() and omp_get_max_threads() are difference. Also, calling torch::set_num_threads(1) made no effect on omp_get_max_threads()

std::cout << torch::get_num_threads() << std::endl;
-1
std::cout << omp_get_max_threads() << std::endl;
8

torch::set_num_threads(1);

std::cout << omp_get_max_threads() << std::endl;
8
std::cout << torch::get_num_threads() << std::endl;
1

omp_set_num_threads(1);

std::cout << omp_get_max_threads() << std::endl;
1

No, I still have the problem and reported it on the github page too:

Managed to limit the thread usage to 1 by using latest nightly built libtorch and below code

#include “ATen/Parallel.h”
at::set_num_threads(1);

It did not work for me. Did you try it with mnist?

I didn’t try it with mnist. Do you test with nightly built libtorch? it only works with the nightly built version.

Yes, I used the downloaded the nightly_build and tested with that.

Hi, was there any progress on this problem? We have a similar problem with Pytorch 1.1 with C++ frontend. On Python, I am able to set the thread limit to 1.

1 Like

i did what you say, but i can not compile the C++ code for the “at::set_num_threads(1);” is wrong, and how should i to set environment?

How do you set the number of threads to 1 in python? Did anyone solve this problem and force PyTorch to use only 1 threads?

The following setting make the number of threads to 1 but as soon as I use an object detection model in torch, the number of threads reaches to 8.

os.environ[“OMP_NUM_THREADS”] = “1”
os.environ[“MKL_NUM_THREADS”] = “1”
torch.set_num_threads(1)
torch.set_num_interop_threads(1)
os.system(“top -H -b -n1 | grep project | wc -l”)

I’m getting the same problem on my C++ torch project in Windows (I’m building as a DLL and calling it from a dotnetcore console app). I have set both the ‘OMP_NUM_THREADS’ and the ‘MKL_NUM_THREADS’ environment variables to 1. I have also tried ‘#include “ATen/Parallel.h”’ and ‘at::set_num_threads(1);’ at the top of my code. Still running the app spawns 14 threads and maxes out all 8 cores. Has anyone solved this on Windows?