How to check if PyTorch is using BLAS?

In Numpy, there is a simple way to check which BLAS is being used by


Is there a similar way in PyTorch?

I am experiencing abnormally-slow performance of PyTorch-CPU on a remote server.
And I suspect PyTorch is not using BLAS.
So I am looking for ways to check:

  1. if PyTorch is using BLAS;
  2. which BLAS

Thanks in advance!


The binaries are all built with MKL, as you can verify by printing

However, also be sure to check torch.__config__.parallel_info() as it could be that the number of threads is not properly set.


Thanks very much! On torch-1.0.1.post2, it shows

AttributeError: module 'torch' has no attribute '__config__'

Oh it could be a new thing added in 1.1.0.

Thanks. Just updated to ‘1.1.0’ and it shows a lot of information.

Question-1: do they literally mean the libs that PyTorch are linked to and will be using in runtime?

Reason for question: I observe >100 times slowdown on a remote server (where I am not an admin) than my person laptop, so I suspect it is not really using the resources it should and need a way to check that.


AttributeError: module 'torch.__config__' has no attribute 'parallel_info'?

Ah my bad, I am using a nightly build. That function was probably added later. You can still check the number of threads using usual posix functionalities.

Thanks all the same! I used torch.get_num_threads() to see # threads being used, and I found that was the cause of abnormal slow-down. When I set it back to OMP_NUM_THREADS=# cores, it came back to normal.

However, I was wondering:

  1. What is the best practice for setting OMP_NUM_THREADS? # physical cores or logical cores? or neither?
  2. Where is the best place to set it? I have two options:
    2.1) set env vars OMP_NUM_THREADS when submitting jobs to cluster
    2.2) set via torch.set_num_threads()
    Not sure which way is better and why. Can you help?

I usually try with different values between those two numbers. But I am not really an expert…

Use torch.set_num_threads. PyTorch uses OMP, MKL, and a native thread pool (as well as TBB maybe). The function takes care of all of them. Not sure if the env flag will set all of them.

I tried many options and found smaller values tend to give good performance in my case.

Thanks for the advice!