Help Needed: How to Integrate AMD BLIS & AMD LibFLAME into PyTorch CPU Backend?

Hello,

I have built libtorch (torch_no_python) using a specific CMake configuration and installed it locally. After successfully completing the build with BUILD_LIBTORCH_WHL=1 python setup.py bdist_wheel, I installed the resulting wheel file into a Python 3.11.11 virtual environment. After installation, I copied the lib, include, bin, and share directories from libtorch into the virtual environment’s Python directory located at /home/s/bt/lib/python3.11/site-packages/torch/.
cmake -B build -S . -G Ninja
-DCMAKE_BUILD_TYPE=Debug
-DBUILD_TEST=OFF
-DBUILD_EXECUTORCH=OFF
-DBUILD_FUNCTORCH=ON
-DUSE_MPI=OFF
-DBUILD_ONEDNN_GRAPH=OFF
-DINTERN_BUILD_MOBILE=OFF
-DCAFFE2_USE_EIGEN_FOR_BLAS=OFF
-DUSE_EIGEN_FOR_BLAS=OFF
-DCAFFE2_USE_MKL=OFF
-DCAFFE2_PERF_WITH_AVX2=ON
-DINTERN_BUILD_ATEN_OPS=ON
-DBLAS=BLIS
-DBLAS_blis_LIBRARY=“/home/s/.local/amd-blis/lib/libblis-mt.so”
-DBLIS_INCLUDE_DIR=“/home/s/.local/amd-blis/include;/home/s/.local/amd-libflame/include”
-DBLIS_LIB=“/home/s/.local/amd-blis/lib/libblis-mt.so;/home/s/.local/amd-libflame/lib/libflame.so”
-DLAPACK_LIBRARIES=“/home/s/.local/amd-libflame/lib/libflame.so”
-DLAPACK_INFO=FLAME
-DMAGMA_INCLUDE_DIR=“/home/s/.local/magma/include”
-DMAGMA_LIBRARIES=“/home/s/.local/magma/lib/libmagma.so;/home/s/.local/magma/lib/libmagma_sparse.so”
-DUSE_ROCM=OFF
-DUSE_CUDA=ON
-DUSE_XPU=OFF
-DUSE_MKL=OFF
-DUSE_FBGEMM=ON
-DUSE_FAKELOWP=ON
-DCMAKE_C_IMPLICIT_LINK_DIRECTORIES=“/lib/x86_64-linux-gnu/libgfortran.so.5”
-DUSE_MKLDNN=OFF
-DUSE_OPENMP=ON
-DUSE_NCCL=OFF
-DUSE_ITT=OFF
-DUSE_MIMALLOC=ON
-DUSE_XNNPACK=OFF
-DTORCH_CUDA_ARCH_LIST=“8.0;8.6;8.9”
-DCMAKE_CUDA_ARCHITECTURES=“80;86;89”
-DCMAKE_CUDA_COMPILER=“/usr/local/cuda-12.8/bin/nvcc”
-DUSE_CUDNN=ON
-DCUDNN_INCLUDE_DIR=“/usr/include”
-DCUDNN_LIBRARY=“/usr/lib/x86_64-linux-gnu/libcudnn.so”
-DCUDA_TOOLKIT_ROOT_DIR=“/usr/local/cuda-12.8”
-DUSE_CUSPARSELT=ON
-DUSE_CUFILE=ON
-DUSE_CUDSS=ON
-DUSE_NVRTC=ON
-DUSE_NUMA=ON
-DONNX_BUILD_SHARED_LIBS=ON
-DBUILD_NVFUSER=ON
-DBUILD_ONNX_PYTHON=ON
-DPYTHON_EXECUTABLE=“$(which python3)”
-DPYTHON_INCLUDE_DIR=“/home/s/bt/include/python3.11”
-DPYTHON_LIBRARY=“/usr/local/python3.11.11/lib/libpython3.11.so”
-DPYTHON_LIBRARIES=“/usr/local/python3.11.11/lib/libpython3.11.so”
-DCMAKE_C_COMPILER=“/usr/bin/x86_64-linux-gnu-gcc-13”
-DCMAKE_CXX_COMPILER=“/usr/bin/x86_64-linux-gnu-g+±13”
-DCMAKE_INSTALL_PREFIX=“$HOME/local/libtorch2.6.0”
-DCMAKE_PREFIX_PATH=“/home/s/bt/lib/python3.11/site-packages;/home/s/.local/amd-blis/include;/home/s/.local/amd-libflame/include”
-DCMAKE_INCLUDE_PATH=“/home/s/.local/amd-blis/include;/home/s/.local/amd-libflame/include”
-DCMAKE_LIBRARY_PATH=“/home/s/.local/amd-blis/lib;/home/s/.local/amd-libflame/lib”
-DUSE_MAGMA=ON
-DMAGMA_V2=ON
-DCMAKE_VERBOSE_MAKEFILE=ON
-DCUDA_NVRTC_LIB=“/usr/local/cuda-12.8/targets/x86_64-linux/lib/libnvrtc.so”
-DCUDA_NVRTC_SHORTHASH=“3e858d13”
-DUSE_TENSORRT=ON
-DTENSORRT_LIBRARY=“/usr/lib/x86_64-linux-gnu/libnvinfer.so”
-DTENSORRT_INCLUDE_DIR=“/usr/include/x86_64-linux-gnu”
-DCMAKE_BUILD_WITH_INSTALL_RPATH=ON
-DBUILD_SHARED_LIBS=ON
-DBUILD_PYTHON=ON
-DHAVE_SOVERSION=ON
-DBUILD_BUNDLE_PTXAS=ON
-DPYTHON_VERSION_STRING=“3.11”
-DPYTHON_SIX_SOURCE_DIR=“/home/sk/bt/lib/python3.11/site-packages”
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON
-DLIBGMP=“/usr/lib/x86_64-linux-gnu/libgmp.so”
-DLIBFFTW3=“/usr/lib/x86_64-linux-gnu/libfftw3.so”
-DOpenMP_C_LIBRARIES=“/usr/lib/gcc/x86_64-linux-gnu/13/libgomp.so”
-DOpenMP_CXX_LIBRARIES=“/usr/lib/gcc/x86_64-linux-gnu/13/libgomp.so”
-DOpenMP_C_INCLUDE_DIRS=“/usr/lib/gcc/x86_64-linux-gnu/13/include”
-DOpenMP_CXX_INCLUDE_DIRS=“/usr/lib/gcc/x86_64-linux-gnu/13/include”
-DCMAKE_C_FLAGS=“-march=znver4 -O3 -ffast-math -mavx512f -mavx512dq -mavx512vnni -mavx512bf16 -mavx512vl -mavx512bw -mavx2 -mfma -pthread -fopenmp -funroll-loops -Wno-error”
-DCMAKE_CXX_FLAGS=“-march=znver4 -O3 -ffast-math -mavx512f -mavx512dq -mavx512vnni -mavx512bf16 -mavx512vl -mavx512bw -mavx2 -mfma -pthread -fopenmp -funroll-loops -Wno-error”
Next, I cleaned the PyTorch source directory by removing the build directory and running python setup.py clean. Then, I re-ran the same CMake command used for building libtorch, ensuring that AMD BLIS and AMD LibFLAME were correctly recognized. After that, I proceeded to build the main PyTorch package using the command:

BUILD_PYTHON_ONLY=1 LDFLAGS="-L/home/s/bt/lib/python3.11/site-packages/torch/lib -Wl,-rpath,/home/s/bt/lib/python3.11/site-packages/torch/lib" python setup.py bdist_wheel.

The build process completes successfully. After installing the newly built PyTorch package into my virtual environment, CUDA and cuDNN are recognized correctly. However, when I check the BLAS and LAPACK configuration by running import torch followed by print(torch.__config__.show()), AMD BLIS and AMD LibFLAME are not recognized as the BLAS and LAPACK backends.

Question:

How can I correctly integrate AMD BLIS (specified as blas=blis) and AMD LibFLAME (specified as lapack=flame) into PyTorch’s CPU backend? Are there any specific build flags or configuration steps that I need to modify to achieve this integration?

Any guidance on the proper way to build PyTorch with these libraries would be greatly appreciated.

Thank you!