How to see details behind CPU-only Libtorch Matrix-Matrix multiplication routines?

velenos14 · June 22, 2023, 4:48pm

I have downloaded the libtorch CPU-only version from the website and unzipped it.

Inside my .cpp application, I write (I am using intel-mkl):

    omp_set_num_threads(64);
    mkl_set_num_threads(64);

I then check:

    std::cout << "torch::get_num_threads() returns: " << torch::get_num_threads() << std::endl;

    std::cout << "omp_get_max_threads() returns: " << omp_get_max_threads() << std::endl;
    std::cout << "mkl_get_max_threads() returns: " << mkl_get_max_threads() << std::endl;

These all return 64. (yes, I do have so many cores, I am on a HPC machine with 128 cores per node and I am launching 2 MPI processes per node).

I then perform std::complex<double> matrix-matrix multiplications via torch:matmul().
These multiplications, for me, seem to be slow.

How can I check that:

Libtorch uses MKL behind the scenes
Libtorch uses threads for its MM multiplications? Is my check from above guaranteeing that Libtorch uses more than 1 thread behind the scenes?

Thank you!

akss · May 29, 2024, 4:30am

The linked libraries can be found with the “dependency walker” tool or similar on Windows and ldd on linux. A running process can be killed and the stack calls in coredump should reveal mkl calls.

The number of thread can be observed with the OS tools, Sysinternals Process Explorer on Windows or top on linux