Hello everyone,
I am trying to build PyTorch from source on a Power9 machine and encounter errors while building. First up, my environment:
- 2x IBM POWER9 SO
- 2x Nvidia V100
- Ubuntu 18.04 LTS
- CUDA 10.1
- cuDNN 8.0.4
- Python 3.8.0
- Master branch of the repo (I have tried other branches too though)
I am using pip in a virtualenv since the larger project this build is part of uses this combo and I would prefer to find a non-conda solution.
This is an overview of my build summary:
-- ******** Summary ********
-- General:
-- CMake version : 3.18.4
-- CMake command : /home/lennart/dl2-benchmark/venv/lib/python3.8/site-packages/cmake/data/bin/cmake
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- C++ compiler id : GNU
-- C++ compiler version : 7.5.0
-- CXX flags : -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow
-- Build type : Release
-- Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
-- CMAKE_PREFIX_PATH : /home/lennart/dl2-benchmark/venv/lib/python3.8/site-packages;/usr/local/cuda
-- CMAKE_INSTALL_PREFIX : /tmp/pip-req-build-omft2v9a/torch
--
-- TORCH_VERSION : 1.8.0
-- CAFFE2_VERSION : 1.8.0
-- BUILD_CAFFE2 : ON
-- BUILD_CAFFE2_OPS : ON
-- BUILD_CAFFE2_MOBILE : OFF
-- BUILD_STATIC_RUNTIME_BENCHMARK: OFF
-- BUILD_TENSOREXPR_BENCHMARK: OFF
-- BUILD_BINARY : OFF
-- BUILD_CUSTOM_PROTOBUF : ON
-- Link local protobuf : ON
-- BUILD_DOCS : OFF
-- BUILD_PYTHON : True
-- Python version : 3.8
-- Python executable : /home/lennart/dl2-benchmark/venv/bin/python
-- Pythonlibs version : 3.8.0
-- Python library : /usr/lib/libpython3.8.so.1.0
-- Python includes : /usr/include/python3.8
-- Python site-packages: lib/python3.8/site-packages
-- BUILD_SHARED_LIBS : ON
-- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF
-- BUILD_TEST : True
-- BUILD_JNI : OFF
-- BUILD_MOBILE_AUTOGRAD : OFF
-- INTERN_BUILD_MOBILE :
-- USE_BLAS : 0
-- USE_LAPACK : 0
-- USE_ASAN : OFF
-- USE_CPP_CODE_COVERAGE : OFF
-- USE_CUDA : ON
-- CUDA static link : OFF
-- USE_CUDNN : ON
-- CUDA version : 10.1
-- cuDNN version : 8.0.4
-- CUDA root directory : /usr/local/cuda
-- CUDA library : /usr/local/cuda/lib64/stubs/libcuda.so
-- cudart library : /usr/local/cuda/lib64/libcudart.so
-- cublas library : /usr/lib/powerpc64le-linux-gnu/libcublas.so
-- cufft library : /usr/local/cuda/lib64/libcufft.so
-- curand library : /usr/local/cuda/lib64/libcurand.so
-- cuDNN library : /usr/lib/powerpc64le-linux-gnu/libcudnn.so
-- nvrtc : /usr/local/cuda/lib64/libnvrtc.so
-- CUDA include path : /usr/local/cuda/include
-- NVCC executable : /usr/local/cuda/bin/nvcc
-- NVCC flags : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_70,code=sm_70;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_BFLOAT16_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
-- CUDA host compiler : /usr/bin/cc
-- NVCC --device-c : OFF
-- USE_TENSORRT : OFF
-- USE_ROCM : OFF
-- USE_EIGEN_FOR_BLAS : ON
-- USE_FBGEMM : OFF
-- USE_FAKELOWP : OFF
-- USE_KINETO : OFF
-- USE_FFMPEG : OFF
-- USE_GFLAGS : OFF
-- USE_GLOG : OFF
-- USE_LEVELDB : OFF
-- USE_LITE_PROTO : OFF
-- USE_LMDB : OFF
-- USE_METAL : OFF
-- USE_PYTORCH_METAL : OFF
-- USE_MKL : OFF
-- USE_MKLDNN : OFF
-- USE_NCCL : ON
-- USE_SYSTEM_NCCL : OFF
-- USE_NNPACK : OFF
-- USE_NUMPY : ON
-- USE_OBSERVERS : ON
-- USE_OPENCL : OFF
-- USE_OPENCV : OFF
-- USE_OPENMP : ON
-- USE_TBB : OFF
-- USE_VULKAN : OFF
-- USE_PROF : OFF
-- USE_QNNPACK : OFF
-- USE_PYTORCH_QNNPACK : OFF
-- USE_REDIS : OFF
-- USE_ROCKSDB : OFF
-- USE_ZMQ : OFF
-- USE_DISTRIBUTED : ON
-- USE_MPI : ON
-- USE_GLOO : ON
-- USE_TENSORPIPE : ON
I don’t really have much experience with building PyTorch from source so I can only guess right now what the original problem is. My guess is that it relates to linking errors like these:
[3216/3432] Linking CXX executable bin/thread_init_test
FAILED: bin/thread_init_test
: && /usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -rdynamic -L/usr/lib -pthread caffe2/CMakeFiles/thread_init_test.dir/__/aten/src/ATen/test/thread_init_test.cpp.o -o bin/thread_init_test -Wl,-rpath,/tmp/pip-req-build-omft2v9a/build/lib:/usr/local/cuda/lib64: lib/libgtest_main.a -Wl,--no-as-needed,"/tmp/pip-req-build-omft2v9a/build/lib/libtorch.so" -Wl,--as-needed -Wl,--no-as-needed,"/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cpu.so" -Wl,--as-needed lib/libprotobuf.a -Wl,--no-as-needed,"/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so" -Wl,--as-needed lib/libc10_cuda.so lib/libc10.so /usr/local/cuda/lib64/libcudart.so /usr/local/cuda/lib64/libnvToolsExt.so /usr/local/cuda/lib64/libcufft.so /usr/local/cuda/lib64/libcurand.so /usr/lib/powerpc64le-linux-gnu/libcublas.so /usr/lib/powerpc64le-linux-gnu/libcudnn.so lib/libgtest.a -pthread && :
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseSpMM'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseSpMM_bufferSize'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseCreateDnMat'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseCreateCoo'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseDestroyDnMat'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseDestroySpMat'
collect2: error: ld returned 1 exit status
The readme on GitHub mentions a similar kind of linking error and recommends to use at least Python 3.8.1. Unfortunately, 3.8.0 is the newest version for Power-PC in Ubuntu 18. Before building Python from source, I would like to confirm if the issue most likely lies in the Python version.
For full reference, I have uploaded the full pip log [1] though in case these errors are caused by something else. I would greatly appreciate if someone could help me with this issue.
[1] https://gist.github.com/lbhm/29f106addcf4769bf1bf5544ba1aec7e