I have an executable that dynamically links to both openBLAS and LibTorch.
AFAICT LibTorch pre-built binaries are statically linked agains MKL, so they include their own version of BLAS symbols, eg:
nm libtorch/lib/libtorch_cpu.so | grep "T sgemm_"
0000000006c531b0 T sgemm_
0000000006c53870 T sgemm_64
0000000006c53870 T sgemm_64_
However, if linking to both openBLAS and LibTorch, it seems that openBLAS symbols are getting in front of LibTorch’s, so, now when execute a program that does:
torch::Tensor tensor = torch::randn({2000, 2000});
It will get to execute sgemm_
from openBLAS instead:
#0 0x00007ffff5537da0 in sgemm_ () from /lib/x86_64-linux-gnu/libopenblas.so.0
#1 0x00007fffde5385d6 in at::native::cpublas::gemm(at::native::TransposeType, at::native::TransposeType, long, long, long, float, float const*, long, float const*, long, float, float*, long) () from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#2 0x00007fffde67c139 in at::native::addmm_impl_cpu_(at::Tensor&, at::Tensor const&, at::Tensor, at::Tensor, c10::Scalar const&, c10::Scalar const&) () from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#3 0x00007fffde67d475 in at::native::structured_mm_out_cpu::impl(at::Tensor const&, at::Tensor const&, at::Tensor const&) ()
from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#4 0x00007fffdf42309b in at::(anonymous namespace)::wrapper_CPU_mm(at::Tensor const&, at::Tensor const&) ()
from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#5 0x00007fffdf423123 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::wrapper_CPU_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
#6 0x00007fffdf1eaa70 in at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) ()
from /home/rstudio/data/torch/build-lantern/libtorch/lib/libtorch_cpu.so
Does anyone knows why this would happen? I’d expect at::native::cpublas::gemm
to call into the internally included global symbol sgemv_
.
Reproducible code
FWIW, the CMakeLists.txt file looks like:set(CMAKE_POSITION_INDEPENDENT_CODE ON)
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(example)
find_package(Torch REQUIRED)
find_package(BLAS)
add_executable(example example.cpp)
target_link_libraries(example "${TORCH_LIBRARIES}" "${BLAS_LIBRARIES}")
set_property(TARGET example PROPERTY CXX_STANDARD 14)
and example.cpp is simply. We have to call into cblas_
to make sure the symbols are included when linking.
#include <iostream>
#include <dlfcn.h>
#include <torch/torch.h>
#include <cblas.h>
int main() {
int m = 3; // rows of A
int n = 3; // cols of A
// Matrix A (m x n) in row-major order
double A[] = {1.0, 2.0, 3.0,
4.0, 5.0, 6.0,
7.0, 8.0, 9.0};
// Vector x (size n)
double x[] = {1.0, 1.0, 1.0};
// Result vector y (size m), initially zero
double y[] = {0.0, 0.0, 0.0};
// Scalar multipliers
double alpha = 1.0, beta = 0.0;
// Perform y = alpha * A * x + beta * y
cblas_dgemv(CblasRowMajor, CblasNoTrans, m, n, alpha, A, n, x, 1, beta, y, 1);
for (auto i = 1; i < 10; i++) {
torch::Tensor tensor = torch::randn({2000, 2000});
auto k = tensor.mm(tensor);
}
return 0;
}