Build fails at linking torch_shm_manager on aarch64

On a Neoverse N1 server CPU (aarch64) using NVIDIA Tesla V100S GPUs, I am trying to build pytorch version 1.11.0 with Cuda 11.3. It ultimately fails due to a linker error in torch_shm_manager. I am using this command:

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install

The error, which I cannot solve, is the following:

[3631/3634] Linking CXX executable bin/torch_shm_manager
FAILED: bin/torch_shm_manager 
: && /home/users/kaftan/anaconda3/envs/pt110cu113/bin/aarch64-conda-linux-gnu-c++ -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -isystem /home/users/kaftan/anaconda3/envs/pt110cu113/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -g -fno-omit-frame-pointer -O0 -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--allow-shlib-undefined -Wl,-rpath,/home/users/kaftan/anaconda3/envs/pt110cu113/lib -Wl,-rpath-link,/home/users/kaftan/anaconda3/envs/pt110cu113/lib -L/home/users/kaftan/anaconda3/envs/pt110cu113/lib -rdynamic    -rdynamic caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/manager.cpp.o -o bin/torch_shm_manager  -Wl,-rpath,/home/users/kaftan/pytorch/build/lib:/usr/local/cuda-11.3/lib64:  lib/libshm.so  -lrt  lib/libtorch.so  -Wl,--no-as-needed,"/home/users/kaftan/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  lib/libprotobufd.a  -pthread  -Wl,--no-as-needed,"/home/users/kaftan/pytorch/build/lib/libtorch_cuda.so" -Wl,--as-needed  lib/libc10_cuda.so  /usr/local/cuda-11.3/lib64/libcudart.so  /usr/local/cuda-11.3/lib64/libnvToolsExt.so  /usr/local/cuda-11.3/lib64/libcufft.so  /usr/local/cuda-11.3/lib64/libcurand.so  /usr/local/cuda-11.3/lib64/libcublas.so  lib/libc10.so && :
/home/users/kaftan/anaconda3/envs/pt110cu113/bin/../lib/gcc/aarch64-conda-linux-gnu/10.4.0/../../../../aarch64-conda-linux-gnu/bin/ld: bin/torch_shm_manager: hidden symbol `__aarch64_cas4_sync' in /home/users/kaftan/anaconda3/envs/pt110cu113/bin/../lib/gcc/aarch64-conda-linux-gnu/10.4.0/libgcc.a(cas_4_5.o) is referenced by DSO
/home/users/kaftan/anaconda3/envs/pt110cu113/bin/../lib/gcc/aarch64-conda-linux-gnu/10.4.0/../../../../aarch64-conda-linux-gnu/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
[3632/3634] Linking CXX shared library lib/libtorch_python.so
ninja: build stopped: subcommand failed.

I have already disabled some elements that caused the build to fail, setting BUILD_TEST=0, USE_BREAKPAD=0 and _GLIBCXX_USE_CXX11_ABI=0 in the environment.

The build summary is the following:

-- ******** Summary ********
-- General:
--   CMake version         : 3.22.1
--   CMake command         : /home/users/kaftan/anaconda3/envs/pt110cu113/bin/cmake
--   System                : Linux
--   C++ compiler          : /home/users/kaftan/anaconda3/envs/pt110cu113/bin/aarch64-conda-linux-gnu-c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 10.4.0
--   Using ccache if found : ON
--   Found ccache          : CCACHE_PROGRAM-NOTFOUND
--   CXX flags             : -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -isystem /home/users/kaftan/anaconda3/envs/pt110cu113/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow
--   Build type            : Debug
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
--   CMAKE_PREFIX_PATH     : /home/users/kaftan/anaconda3/envs/pt110cu113/lib/python3.9/site-packages;/home/users/kaftan/anaconda3/envs/pt110cu113;/usr/local/cuda-11.3
--   CMAKE_INSTALL_PREFIX  : /home/users/kaftan/pytorch/torch
--   USE_GOLD_LINKER       : OFF
-- 
--   TORCH_VERSION         : 1.11.0
--   CAFFE2_VERSION        : 1.11.0
--   BUILD_CAFFE2          : OFF
--   BUILD_CAFFE2_OPS      : OFF
--   BUILD_CAFFE2_MOBILE   : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_NVFUSER_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : ON
--     Link local protobuf : ON
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : True
--     Python version      : 3.9.12
--     Python executable   : /home/users/kaftan/anaconda3/envs/pt110cu113/bin/python
--     Pythonlibs version  : 3.9.12
--     Python library      : /home/users/kaftan/anaconda3/envs/pt110cu113/lib/libpython3.9.a
--     Python includes     : /home/users/kaftan/anaconda3/envs/pt110cu113/include/python3.9
--     Python site-packages: lib/python3.9/site-packages
--   BUILD_SHARED_LIBS     : ON
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : OFF
--   BUILD_TEST            : False
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   BUILD_LITE_INTERPRETER: OFF
--   INTERN_BUILD_MOBILE   : 
--   USE_BLAS              : 1
--     BLAS                : open
--     BLAS_HAS_SBGEMM     : 
--   USE_LAPACK            : 1
--     LAPACK              : open
--   USE_ASAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : ON
--     Split CUDA          : OFF
--     CUDA static link    : OFF
--     USE_CUDNN           : OFF
--     USE_EXPERIMENTAL_CUDNN_V8_API: OFF
--     CUDA version        : 11.3
--     CUDA root directory : /usr/local/cuda-11.3
--     CUDA library        : /usr/local/cuda-11.3/lib64/stubs/libcuda.so
--     cudart library      : /usr/local/cuda-11.3/lib64/libcudart.so
--     cublas library      : /usr/local/cuda-11.3/lib64/libcublas.so
--     cufft library       : /usr/local/cuda-11.3/lib64/libcufft.so
--     curand library      : /usr/local/cuda-11.3/lib64/libcurand.so
--     nvrtc               : /usr/local/cuda-11.3/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda-11.3/include
--     NVCC executable     : /usr/local/cuda-11.3/bin/nvcc
--     CUDA compiler       : /usr/local/cuda-11.3/bin/nvcc
--     CUDA flags          :  -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_70,code=sm_70 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda  -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__
--     CUDA host compiler  : 
--     CUDA --device-c     : OFF
--     USE_TENSORRT        : OFF
--   USE_ROCM              : OFF
--   USE_EIGEN_FOR_BLAS    : ON
--   USE_FBGEMM            : OFF
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : ON
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_PYTORCH_METAL_EXPORT     : OFF
--   USE_FFTW              : OFF
--   USE_MKL               : OFF
--   USE_MKLDNN            : OFF
--   USE_NCCL              : ON
--     USE_SYSTEM_NCCL     : OFF
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : ON
--   USE_PYTORCH_QNNPACK   : ON
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI               : OFF
--     USE_GLOO              : ON
--     USE_GLOO_WITH_OPENSSL : OFF
--     USE_TENSORPIPE        : ON
--   USE_DEPLOY           : OFF
--   USE_BREAKPAD         : 0
--   Public Dependencies  : caffe2::Threads
--   Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fp16;gloo;tensorpipe;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
--   USE_COREML_DELEGATE     : OFF

Please let me know if you need any more details on my build environment, thank you for your help.

Versions

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (aarch64)
GCC version: (conda-forge gcc 10.4.0-17) 10.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.9.12 (main, Jun 1 2022, 11:39:41) [GCC 10.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-73-generic-aarch64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 11.3.109
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration:
GPU 0: Tesla V100S-PCIE-32GB
GPU 1: Tesla V100S-PCIE-32GB
GPU 2: Tesla V100S-PCIE-32GB

Nvidia driver version: 530.30.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Vendor ID: ARM
Model name: Neoverse-N1
Model: 1
Thread(s) per core: 1
Core(s) per socket: 80
Socket(s): 1
Stepping: r3p1
Frequency boost: disabled
CPU max MHz: 3300.0000
CPU min MHz: 1000.0000
BogoMIPS: 50.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
L1d cache: 5 MiB (80 instances)
L1i cache: 5 MiB (80 instances)
L2 cache: 80 MiB (80 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-79
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] numpy==1.24.3
[conda] numpy 1.24.3 py39h8708280_0
[conda] numpy-base 1.24.3 py39h4a83355_0

I attached the full build log in this issue, if anyone is interested:
https://github.com/pytorch/pytorch/issues/103150