On a Neoverse N1 server CPU (aarch64) using NVIDIA Tesla V100S GPUs, I am trying to build pytorch version 1.11.0 with Cuda 11.3. It ultimately fails due to a linker error in torch_shm_manager. I am using this command:
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install
The error, which I cannot solve, is the following:
[3631/3634] Linking CXX executable bin/torch_shm_manager
FAILED: bin/torch_shm_manager
: && /home/users/kaftan/anaconda3/envs/pt110cu113/bin/aarch64-conda-linux-gnu-c++ -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -isystem /home/users/kaftan/anaconda3/envs/pt110cu113/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -g -fno-omit-frame-pointer -O0 -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--allow-shlib-undefined -Wl,-rpath,/home/users/kaftan/anaconda3/envs/pt110cu113/lib -Wl,-rpath-link,/home/users/kaftan/anaconda3/envs/pt110cu113/lib -L/home/users/kaftan/anaconda3/envs/pt110cu113/lib -rdynamic -rdynamic caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/manager.cpp.o -o bin/torch_shm_manager -Wl,-rpath,/home/users/kaftan/pytorch/build/lib:/usr/local/cuda-11.3/lib64: lib/libshm.so -lrt lib/libtorch.so -Wl,--no-as-needed,"/home/users/kaftan/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed lib/libprotobufd.a -pthread -Wl,--no-as-needed,"/home/users/kaftan/pytorch/build/lib/libtorch_cuda.so" -Wl,--as-needed lib/libc10_cuda.so /usr/local/cuda-11.3/lib64/libcudart.so /usr/local/cuda-11.3/lib64/libnvToolsExt.so /usr/local/cuda-11.3/lib64/libcufft.so /usr/local/cuda-11.3/lib64/libcurand.so /usr/local/cuda-11.3/lib64/libcublas.so lib/libc10.so && :
/home/users/kaftan/anaconda3/envs/pt110cu113/bin/../lib/gcc/aarch64-conda-linux-gnu/10.4.0/../../../../aarch64-conda-linux-gnu/bin/ld: bin/torch_shm_manager: hidden symbol `__aarch64_cas4_sync' in /home/users/kaftan/anaconda3/envs/pt110cu113/bin/../lib/gcc/aarch64-conda-linux-gnu/10.4.0/libgcc.a(cas_4_5.o) is referenced by DSO
/home/users/kaftan/anaconda3/envs/pt110cu113/bin/../lib/gcc/aarch64-conda-linux-gnu/10.4.0/../../../../aarch64-conda-linux-gnu/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
[3632/3634] Linking CXX shared library lib/libtorch_python.so
ninja: build stopped: subcommand failed.
I have already disabled some elements that caused the build to fail, setting BUILD_TEST=0
, USE_BREAKPAD=0
and _GLIBCXX_USE_CXX11_ABI=0
in the environment.
The build summary is the following:
-- ******** Summary ********
-- General:
-- CMake version : 3.22.1
-- CMake command : /home/users/kaftan/anaconda3/envs/pt110cu113/bin/cmake
-- System : Linux
-- C++ compiler : /home/users/kaftan/anaconda3/envs/pt110cu113/bin/aarch64-conda-linux-gnu-c++
-- C++ compiler id : GNU
-- C++ compiler version : 10.4.0
-- Using ccache if found : ON
-- Found ccache : CCACHE_PROGRAM-NOTFOUND
-- CXX flags : -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -isystem /home/users/kaftan/anaconda3/envs/pt110cu113/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow
-- Build type : Debug
-- Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
-- CMAKE_PREFIX_PATH : /home/users/kaftan/anaconda3/envs/pt110cu113/lib/python3.9/site-packages;/home/users/kaftan/anaconda3/envs/pt110cu113;/usr/local/cuda-11.3
-- CMAKE_INSTALL_PREFIX : /home/users/kaftan/pytorch/torch
-- USE_GOLD_LINKER : OFF
--
-- TORCH_VERSION : 1.11.0
-- CAFFE2_VERSION : 1.11.0
-- BUILD_CAFFE2 : OFF
-- BUILD_CAFFE2_OPS : OFF
-- BUILD_CAFFE2_MOBILE : OFF
-- BUILD_STATIC_RUNTIME_BENCHMARK: OFF
-- BUILD_TENSOREXPR_BENCHMARK: OFF
-- BUILD_NVFUSER_BENCHMARK: OFF
-- BUILD_BINARY : OFF
-- BUILD_CUSTOM_PROTOBUF : ON
-- Link local protobuf : ON
-- BUILD_DOCS : OFF
-- BUILD_PYTHON : True
-- Python version : 3.9.12
-- Python executable : /home/users/kaftan/anaconda3/envs/pt110cu113/bin/python
-- Pythonlibs version : 3.9.12
-- Python library : /home/users/kaftan/anaconda3/envs/pt110cu113/lib/libpython3.9.a
-- Python includes : /home/users/kaftan/anaconda3/envs/pt110cu113/include/python3.9
-- Python site-packages: lib/python3.9/site-packages
-- BUILD_SHARED_LIBS : ON
-- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF
-- BUILD_TEST : False
-- BUILD_JNI : OFF
-- BUILD_MOBILE_AUTOGRAD : OFF
-- BUILD_LITE_INTERPRETER: OFF
-- INTERN_BUILD_MOBILE :
-- USE_BLAS : 1
-- BLAS : open
-- BLAS_HAS_SBGEMM :
-- USE_LAPACK : 1
-- LAPACK : open
-- USE_ASAN : OFF
-- USE_CPP_CODE_COVERAGE : OFF
-- USE_CUDA : ON
-- Split CUDA : OFF
-- CUDA static link : OFF
-- USE_CUDNN : OFF
-- USE_EXPERIMENTAL_CUDNN_V8_API: OFF
-- CUDA version : 11.3
-- CUDA root directory : /usr/local/cuda-11.3
-- CUDA library : /usr/local/cuda-11.3/lib64/stubs/libcuda.so
-- cudart library : /usr/local/cuda-11.3/lib64/libcudart.so
-- cublas library : /usr/local/cuda-11.3/lib64/libcublas.so
-- cufft library : /usr/local/cuda-11.3/lib64/libcufft.so
-- curand library : /usr/local/cuda-11.3/lib64/libcurand.so
-- nvrtc : /usr/local/cuda-11.3/lib64/libnvrtc.so
-- CUDA include path : /usr/local/cuda-11.3/include
-- NVCC executable : /usr/local/cuda-11.3/bin/nvcc
-- CUDA compiler : /usr/local/cuda-11.3/bin/nvcc
-- CUDA flags : -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_70,code=sm_70 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__
-- CUDA host compiler :
-- CUDA --device-c : OFF
-- USE_TENSORRT : OFF
-- USE_ROCM : OFF
-- USE_EIGEN_FOR_BLAS : ON
-- USE_FBGEMM : OFF
-- USE_FAKELOWP : OFF
-- USE_KINETO : ON
-- USE_FFMPEG : OFF
-- USE_GFLAGS : OFF
-- USE_GLOG : OFF
-- USE_LEVELDB : OFF
-- USE_LITE_PROTO : OFF
-- USE_LMDB : OFF
-- USE_METAL : OFF
-- USE_PYTORCH_METAL : OFF
-- USE_PYTORCH_METAL_EXPORT : OFF
-- USE_FFTW : OFF
-- USE_MKL : OFF
-- USE_MKLDNN : OFF
-- USE_NCCL : ON
-- USE_SYSTEM_NCCL : OFF
-- USE_NNPACK : ON
-- USE_NUMPY : ON
-- USE_OBSERVERS : ON
-- USE_OPENCL : OFF
-- USE_OPENCV : OFF
-- USE_OPENMP : ON
-- USE_TBB : OFF
-- USE_VULKAN : OFF
-- USE_PROF : OFF
-- USE_QNNPACK : ON
-- USE_PYTORCH_QNNPACK : ON
-- USE_REDIS : OFF
-- USE_ROCKSDB : OFF
-- USE_ZMQ : OFF
-- USE_DISTRIBUTED : ON
-- USE_MPI : OFF
-- USE_GLOO : ON
-- USE_GLOO_WITH_OPENSSL : OFF
-- USE_TENSORPIPE : ON
-- USE_DEPLOY : OFF
-- USE_BREAKPAD : 0
-- Public Dependencies : caffe2::Threads
-- Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fp16;gloo;tensorpipe;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
-- USE_COREML_DELEGATE : OFF
Please let me know if you need any more details on my build environment, thank you for your help.
Versions
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.1 LTS (aarch64)
GCC version: (conda-forge gcc 10.4.0-17) 10.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35
Python version: 3.9.12 (main, Jun 1 2022, 11:39:41) [GCC 10.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-73-generic-aarch64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 11.3.109
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration:
GPU 0: Tesla V100S-PCIE-32GB
GPU 1: Tesla V100S-PCIE-32GB
GPU 2: Tesla V100S-PCIE-32GB
Nvidia driver version: 530.30.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A
CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Vendor ID: ARM
Model name: Neoverse-N1
Model: 1
Thread(s) per core: 1
Core(s) per socket: 80
Socket(s): 1
Stepping: r3p1
Frequency boost: disabled
CPU max MHz: 3300.0000
CPU min MHz: 1000.0000
BogoMIPS: 50.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
L1d cache: 5 MiB (80 instances)
L1i cache: 5 MiB (80 instances)
L2 cache: 80 MiB (80 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-79
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.24.3
[conda] numpy 1.24.3 py39h8708280_0
[conda] numpy-base 1.24.3 py39h4a83355_0