Linking Ruy Quantized Matrix Multiplication

I’ve built PyTorch from source, just to get quantized matrix multiplication operator from Ruy Quantized Matrix Multiplication. There is #include<ruy/ruy.h> directive in aten/src/ATen/native/quantized/cpu/RuyUtils.h. I’ve also built google/ruy repository, copied ruy/ruy directory in aten/src (because there is ruy/ruy/ruy.h file).

python setup.py build manages to find ruy/ruy.h but fails to link it with following errors:

Building wheel torch-1.13.0a0+git0f561f0
-- Building version 1.13.0a0+git0f561f0
cmake --build . --target install --config Release
[4/13] Linking CXX executable bin/FileStoreTest
FAILED: bin/FileStoreTest 
: && /usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic test_cpp_c10d/CMakeFiles/FileStoreTest.dir/FileStoreTest.cpp.o -o bin/FileStoreTest  -Wl,-rpath,/ec2-user/repos/pytorch/build/lib:  lib/libtorch_cpu.so  lib/libgtest_main.a  -lpthread  lib/libprotobuf.a  lib/libc10.so  lib/libgtest.a  -pthread && :
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::clear_performance_advisories()'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx2(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2SingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::get_ctx(ruy::Context*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Context::~Context()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::MulFrontEndFromTrMulParams(ruy::Ctx*, ruy::TrMulParams*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::SelectPath(ruy::Path)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx512(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx512(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvxSingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::set_performance_advisory(ruy::PerformanceAdvisory)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx2(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Allocator::AllocateBytes(long)'
lib/libtorch_cpu.so: undefined reference to `ruy::detail::MultiplyByQuantizedMultiplier(int, int, int)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::GetMainAllocator()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512SingleCol(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Context::Context()'
collect2: error: ld returned 1 exit status
[5/13] Linking CXX executable bin/TCPStoreTest
FAILED: bin/TCPStoreTest 
: && /usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic test_cpp_c10d/CMakeFiles/TCPStoreTest.dir/TCPStoreTest.cpp.o -o bin/TCPStoreTest  -Wl,-rpath,/ec2-user/repos/pytorch/build/lib:  lib/libtorch_cpu.so  lib/libgtest_main.a  -lpthread  lib/libprotobuf.a  lib/libc10.so  lib/libgtest.a  -pthread && :
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::clear_performance_advisories()'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx2(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2SingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::get_ctx(ruy::Context*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Context::~Context()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::MulFrontEndFromTrMulParams(ruy::Ctx*, ruy::TrMulParams*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::SelectPath(ruy::Path)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx512(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx512(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvxSingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::set_performance_advisory(ruy::PerformanceAdvisory)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx2(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Allocator::AllocateBytes(long)'
lib/libtorch_cpu.so: undefined reference to `ruy::detail::MultiplyByQuantizedMultiplier(int, int, int)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::GetMainAllocator()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512SingleCol(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Context::Context()'
collect2: error: ld returned 1 exit status
[6/13] Linking CXX executable bin/HashStoreTest
FAILED: bin/HashStoreTest 
: && /usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic test_cpp_c10d/CMakeFiles/HashStoreTest.dir/HashStoreTest.cpp.o -o bin/HashStoreTest  -Wl,-rpath,/ec2-user/repos/pytorch/build/lib:  lib/libtorch_cpu.so  lib/libgtest_main.a  -lpthread  lib/libprotobuf.a  lib/libc10.so  lib/libgtest.a  -pthread && :
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::clear_performance_advisories()'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx2(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2SingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::get_ctx(ruy::Context*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Context::~Context()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::MulFrontEndFromTrMulParams(ruy::Ctx*, ruy::TrMulParams*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::SelectPath(ruy::Path)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx512(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx512(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvxSingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::set_performance_advisory(ruy::PerformanceAdvisory)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx2(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Allocator::AllocateBytes(long)'
lib/libtorch_cpu.so: undefined reference to `ruy::detail::MultiplyByQuantizedMultiplier(int, int, int)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::GetMainAllocator()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512SingleCol(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Context::Context()'
collect2: error: ld returned 1 exit status
[7/13] Linking CXX executable bin/ProcessGroupGlooTest
FAILED: bin/ProcessGroupGlooTest 
: && /usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic test_cpp_c10d/CMakeFiles/ProcessGroupGlooTest.dir/ProcessGroupGlooTest.cpp.o -o bin/ProcessGroupGlooTest  -Wl,-rpath,/ec2-user/repos/pytorch/build/lib:/usr/local/cuda/lib64:  lib/libtorch_cpu.so  lib/libc10d_cuda_test.so  lib/libgtest_main.a  -lpthread  lib/libtorch_cuda.so  lib/libc10_cuda.so  /usr/local/cuda/lib64/libcudart.so  /usr/local/cuda/lib64/libnvToolsExt.so  lib/libprotobuf.a  lib/libc10.so  -Wl,--no-as-needed,"/ec2-user/repos/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  /usr/local/cuda/lib64/libcufft.so  /usr/local/cuda/lib64/libcurand.so  /usr/local/cuda/lib64/libcublas.so  /usr/lib/x86_64-linux-gnu/libcudnn.so  lib/libgtest.a  -pthread && :
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::clear_performance_advisories()'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx2(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2SingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::get_ctx(ruy::Context*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Context::~Context()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::MulFrontEndFromTrMulParams(ruy::Ctx*, ruy::TrMulParams*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::SelectPath(ruy::Path)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx512(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx512(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvxSingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::set_performance_advisory(ruy::PerformanceAdvisory)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx2(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Allocator::AllocateBytes(long)'
lib/libtorch_cpu.so: undefined reference to `ruy::detail::MultiplyByQuantizedMultiplier(int, int, int)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::GetMainAllocator()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512SingleCol(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Context::Context()'
collect2: error: ld returned 1 exit status
[8/13] Linking CXX executable bin/ProcessGroupGlooAsyncTest
FAILED: bin/ProcessGroupGlooAsyncTest 
: && /usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic test_cpp_c10d/CMakeFiles/ProcessGroupGlooAsyncTest.dir/ProcessGroupGlooAsyncTest.cpp.o -o bin/ProcessGroupGlooAsyncTest  -Wl,-rpath,/ec2-user/repos/pytorch/build/lib:/usr/local/cuda/lib64  lib/libtorch_cpu.so  lib/libc10d_cuda_test.so  lib/libgtest_main.a  -lpthread  lib/libtorch_cuda.so  lib/libc10_cuda.so  /usr/local/cuda/lib64/libcudart.so  /usr/local/cuda/lib64/libnvToolsExt.so  lib/libprotobuf.a  lib/libc10.so  -Wl,--no-as-needed,"/ec2-user/repos/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  /usr/local/cuda/lib64/libcufft.so  /usr/local/cuda/lib64/libcurand.so  /usr/local/cuda/lib64/libcublas.so  /usr/lib/x86_64-linux-gnu/libcudnn.so  lib/libgtest.a  -pthread && :
lib/libtorch_cpu.so: undefined reference to `ruy::Context::Context()'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::clear_performance_advisories()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2SingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::MulFrontEndFromTrMulParams(ruy::Ctx*, ruy::TrMulParams*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::SelectPath(ruy::Path)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx512(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx2(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::set_performance_advisory(ruy::PerformanceAdvisory)'
lib/libtorch_cpu.so: undefined reference to `ruy::Allocator::AllocateBytes(long)'
lib/libtorch_cpu.so: undefined reference to `ruy::detail::MultiplyByQuantizedMultiplier(int, int, int)'
lib/libtorch_cpu.so: undefined reference to `ruy::Ctx::GetMainAllocator()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx512SingleCol(ruy::KernelParams8bit<16, 16> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx2(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitColMajorForAvx(signed char const*, signed char, signed char const*, int, int, int, signed char*, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::get_ctx(ruy::Context*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Context::~Context()'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Pack8bitRowMajorForAvx512(unsigned char const*, int, int, signed char*, int, int, int, int, int, int, int, int*)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvxSingleCol(ruy::KernelParams8bit<8, 8> const&)'
lib/libtorch_cpu.so: undefined reference to `ruy::Kernel8bitAvx2(ruy::KernelParams8bit<8, 8> const&)'
collect2: error: ld returned 1 exit status
[9/13] Linking CXX shared library lib/libtorch_cuda_linalg.so
ninja: build stopped: subcommand failed.

Also, I’ve tried changing buckbuild.bzl and defs.bzl by changing flag -DUSE_RUY_QMATMUL to 1, but it didn’t do anything. I don’t understand what I have to do in order to successfully link Ruy, so I’ll be grateful for any help.

Hi Mateo, are you able to build ruy on its own? If so, what steps did you take to build it?

Also, there’s a rule in third_party/BUCK.oss which looks for ruy source files. So maybe try putting the ruy folder in third_party/

Hi Salil, thanks for reply!

I’ve tried that, but probably missing some step in build pipeline. ninja terminates compilation because <ruy/ruy.h> can’t be included. Running python setup.py develop gives me this error:

/ec2-user/repos/pytorch/aten/src/ATen/native/quantized/cpu/RuyUtils.h:7:10: fatal error: ruy/ruy.h: No such file or directory
 #include <ruy/ruy.h>
          ^~~~~~~~~~~
compilation terminated.
[2/13] Building CXX object caffe2/CMak...Ten/native/quantized/cpu/qmatmul.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmatmul.cpp.o 
/usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DBUILD_ONEDNN_GRAPH -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/ec2-user/repos/pytorch/build/aten/src -I/ec2-user/repos/pytorch/aten/src -I/ec2-user/repos/pytorch/build -I/ec2-user/repos/pytorch -I/ec2-user/repos/pytorch/cmake/../third_party/benchmark/include -I/ec2-user/repos/pytorch/cmake/../third_party/cudnn_frontend/include -I/ec2-user/repos/pytorch/third_party/onnx -I/ec2-user/repos/pytorch/build/third_party/onnx -I/ec2-user/repos/pytorch/third_party/foxi -I/ec2-user/repos/pytorch/build/third_party/foxi -I/ec2-user/repos/pytorch/torch/csrc/api -I/ec2-user/repos/pytorch/torch/csrc/api/include -I/ec2-user/repos/pytorch/caffe2/aten/src/TH -I/ec2-user/repos/pytorch/build/caffe2/aten/src/TH -I/ec2-user/repos/pytorch/build/caffe2/aten/src -I/ec2-user/repos/pytorch/build/caffe2/../aten/src -I/ec2-user/repos/pytorch/torch/csrc -I/ec2-user/repos/pytorch/third_party/miniz-2.1.0 -I/ec2-user/repos/pytorch/third_party/kineto/libkineto/include -I/ec2-user/repos/pytorch/third_party/kineto/libkineto/src -I/ec2-user/repos/pytorch/torch/csrc/distributed -I/ec2-user/repos/pytorch/aten/../third_party/catch/single_include -I/ec2-user/repos/pytorch/aten/src/ATen/.. -I/ec2-user/repos/pytorch/third_party/FXdiv/include -I/ec2-user/repos/pytorch/c10/.. -I/ec2-user/repos/pytorch/third_party/pthreadpool/include -I/ec2-user/repos/pytorch/third_party/cpuinfo/include -I/ec2-user/repos/pytorch/third_party/QNNPACK/include -I/ec2-user/repos/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/ec2-user/repos/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/ec2-user/repos/pytorch/third_party/cpuinfo/deps/clog/include -I/ec2-user/repos/pytorch/third_party/NNPACK/include -I/ec2-user/repos/pytorch/third_party/fbgemm/include -I/ec2-user/repos/pytorch/third_party/fbgemm -I/ec2-user/repos/pytorch/third_party/fbgemm/third_party/asmjit/src -I/ec2-user/repos/pytorch/third_party/ittapi/src/ittnotify -I/ec2-user/repos/pytorch/third_party/FP16/include -I/ec2-user/repos/pytorch/third_party/tensorpipe -I/ec2-user/repos/pytorch/build/third_party/tensorpipe -I/ec2-user/repos/pytorch/third_party/tensorpipe/third_party/libnop/include -I/ec2-user/repos/pytorch/third_party/fmt/include -I/ec2-user/repos/pytorch/build/third_party/ideep/mkl-dnn/third_party/oneDNN/include -I/ec2-user/repos/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/src/../include -I/ec2-user/repos/pytorch/third_party/flatbuffers/include -isystem /ec2-user/repos/pytorch/build/third_party/gloo -isystem /ec2-user/repos/pytorch/cmake/../third_party/gloo -isystem /ec2-user/repos/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /ec2-user/repos/pytorch/cmake/../third_party/googletest/googletest/include -isystem /ec2-user/repos/pytorch/third_party/protobuf/src -isystem /ec2-user/repos/pytorch/third_party/gemmlowp -isystem /ec2-user/repos/pytorch/third_party/neon2sse -isystem /ec2-user/repos/pytorch/third_party/XNNPACK/include -isystem /ec2-user/repos/pytorch/third_party/ittapi/include -isystem /ec2-user/repos/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /ec2-user/repos/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem /ec2-user/repos/pytorch/third_party/ideep/include -isystem /ec2-user/repos/pytorch/third_party/ideep/mkl-dnn/include -isystem /ec2-user/repos/pytorch/build/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-sign-compare -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -fopenmp -DCAFFE2_BUILD_MAIN_LIB -pthread -DASMJIT_STATIC -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmatmul.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmatmul.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmatmul.cpp.o -c /ec2-user/repos/pytorch/aten/src/ATen/native/quantized/cpu/qmatmul.cpp
In file included from /ec2-user/repos/pytorch/aten/src/ATen/native/quantized/cpu/qmatmul.cpp:8:0:
/ec2-user/repos/pytorch/aten/src/ATen/native/quantized/cpu/RuyUtils.h:7:10: fatal error: ruy/ruy.h: No such file or directory
 #include <ruy/ruy.h>
          ^~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

I’ve also tried building ruy only from its directory by cmake . and make.

I am not too familiar with what setup.py is doing or how building with cmake works, but an idea I had is you could try to look for mentions of third_party in CMakeLists.txt or setup.py and see if there’s an appropriate place to add ruy along with where other third party libraries are currently listed.