Custom operation - problem with usage of CPUBlas functionality

Hi,
I’m working on some custom operation, which is a part of a shared library (let’s call it libxxx.so).
After browsing the torch code, I found gemm_batched function, which is perfect for my implementation.
However, I faced a couple of issues with that function.

  1. Signature from CPUBlas.h differs from CPUBlas.cpp.
    This results in undefined symbols (either at linking time when building Cpp test binary or at loading time in Python code) which cannot be found anywhere. After aligning signatures in the torch code I got myself to the 2nd issue (I guess I can also remove CPUBlas.h include and mark gemm_batched as extern).
  2. When libxxx.so gets loaded, I get the following error:
OSError: project_root/libxxx.so: undefined symbol: _ZN2at6native7cpublas12gemm_batchedIfEEvNS0_13TransposeTypeES3_llllT_PPKS4_lS7_lS4_PPS4_l

which corresponds to:

void at::native::cpublas::gemm_batched<float>(at::native::TransposeType, at::native::TransposeType, long, long, long, long, float, float const**, long, float const**, long, float, float**, long)

This symbol indeed is marked as undefined in libxxx.so. ldd shows that libxxx.so requires libtorch_cpu.so, where mentioned symbol is present - though not found in my case.
When building Cpp test binary I get:

/usr/bin/c++ (...) libxxx.so  lib/libgtest_main.a  ../.venv/lib/python3.8/site-packages/torch/lib/libtorch.so  -Wl,--no-as-needed,"project_root/.venv/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so" (...)
/usr/bin/ld: libxxx.so: undefined reference to `void at::native::cpublas::gemm_batched<float>(at::native::TransposeType, at::native::TransposeType, long, long, long, long, float, float const**, long, float const**, long, float, float**, long)'
collect2: error: ld returned 1 exit status

so similar issue, despite libtorch_cpu.so containing the required symbol is in fact linked.

More about the building:

find_package(Torch REQUIRED)
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_LIBRARIES})

where Torch_DIR is set to “project_root/.venv/lib/python3.8/site-packages/torch/share/cmake”

There are no issue with any other functionality from at/torch. Any ideas on how to overcome this problem?