LibTorch symbols getting overriden when linking to openBLAS

dfalbel · November 11, 2024, 4:59pm

I have an executable that dynamically links to both openBLAS and LibTorch.
AFAICT LibTorch pre-built binaries are statically linked agains MKL, so they include their own version of BLAS symbols, eg:

nm libtorch/lib/libtorch_cpu.so | grep "T sgemm_"
0000000006c531b0 T sgemm_
0000000006c53870 T sgemm_64
0000000006c53870 T sgemm_64_

However, if linking to both openBLAS and LibTorch, it seems that openBLAS symbols are getting in front of LibTorch’s, so, now when execute a program that does:

 torch::Tensor tensor = torch::randn({2000, 2000});

It will get to execute sgemm_ from openBLAS instead:

#0  0x00007ffff5537da0 in sgemm_ () from /lib/x86_64-linux-gnu/libopenblas.so.0
#1  0x00007fffde5385d6 in at::native::cpublas::gemm(at::native::TransposeType, at::native::TransposeType, long, long, long, float, float const*, long, float const*, long, float, float*, long) () from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#2  0x00007fffde67c139 in at::native::addmm_impl_cpu_(at::Tensor&, at::Tensor const&, at::Tensor, at::Tensor, c10::Scalar const&, c10::Scalar const&) () from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#3  0x00007fffde67d475 in at::native::structured_mm_out_cpu::impl(at::Tensor const&, at::Tensor const&, at::Tensor const&) ()
   from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#4  0x00007fffdf42309b in at::(anonymous namespace)::wrapper_CPU_mm(at::Tensor const&, at::Tensor const&) ()
   from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#5  0x00007fffdf423123 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::wrapper_CPU_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#6  0x00007fffdf1eaa70 in at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) ()
   from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so

Does anyone knows why this would happen? I’d expect at::native::cpublas::gemm to call into the internally included global symbol sgemv_.

Reproducible code

FWIW, the CMakeLists.txt file looks like:

set(CMAKE_POSITION_INDEPENDENT_CODE ON)
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(example)

find_package(Torch REQUIRED)
find_package(BLAS)

add_executable(example example.cpp)
target_link_libraries(example "${TORCH_LIBRARIES}" "${BLAS_LIBRARIES}")
set_property(TARGET example PROPERTY CXX_STANDARD 14)

and example.cpp is simply. We have to call into cblas_ to make sure the symbols are included when linking.

#include <iostream>
#include <dlfcn.h>
#include <torch/torch.h>
#include <cblas.h>

int main() {
  
  int m = 3; // rows of A
  int n = 3; // cols of A

  // Matrix A (m x n) in row-major order
  double A[] = {1.0, 2.0, 3.0,
                4.0, 5.0, 6.0,
                7.0, 8.0, 9.0};

  // Vector x (size n)
  double x[] = {1.0, 1.0, 1.0};

  // Result vector y (size m), initially zero
  double y[] = {0.0, 0.0, 0.0};

  // Scalar multipliers
  double alpha = 1.0, beta = 0.0;

  // Perform y = alpha * A * x + beta * y
  cblas_dgemv(CblasRowMajor, CblasNoTrans, m, n, alpha, A, n, x, 1, beta, y, 1);
  

  for (auto i = 1; i < 10; i++) {
    torch::Tensor tensor = torch::randn({2000, 2000});
    auto k = tensor.mm(tensor);  
  }
  
  return 0;
}