Building from source on a Power9 machine

Hello everyone,

I am trying to build PyTorch from source on a Power9 machine and encounter errors while building. First up, my environment:

  • 2x IBM POWER9 SO
  • 2x Nvidia V100
  • Ubuntu 18.04 LTS
  • CUDA 10.1
  • cuDNN 8.0.4
  • Python 3.8.0
  • Master branch of the repo (I have tried other branches too though)

I am using pip in a virtualenv since the larger project this build is part of uses this combo and I would prefer to find a non-conda solution.
This is an overview of my build summary:

-- ******** Summary ********
-- General:
--   CMake version         : 3.18.4
--   CMake command         : /home/lennart/dl2-benchmark/venv/lib/python3.8/site-packages/cmake/data/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 7.5.0
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
--   CMAKE_PREFIX_PATH     : /home/lennart/dl2-benchmark/venv/lib/python3.8/site-packages;/usr/local/cuda
--   CMAKE_INSTALL_PREFIX  : /tmp/pip-req-build-omft2v9a/torch
--
--   TORCH_VERSION         : 1.8.0
--   CAFFE2_VERSION        : 1.8.0
--   BUILD_CAFFE2          : ON
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_CAFFE2_MOBILE   : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : ON
--     Link local protobuf : ON
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : True
--     Python version      : 3.8
--     Python executable   : /home/lennart/dl2-benchmark/venv/bin/python
--     Pythonlibs version  : 3.8.0
--     Python library      : /usr/lib/libpython3.8.so.1.0
--     Python includes     : /usr/include/python3.8
--     Python site-packages: lib/python3.8/site-packages
--   BUILD_SHARED_LIBS     : ON
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : OFF
--   BUILD_TEST            : True
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   INTERN_BUILD_MOBILE   :
--   USE_BLAS              : 0
--   USE_LAPACK            : 0
--   USE_ASAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : ON
--     CUDA static link    : OFF
--     USE_CUDNN           : ON
--     CUDA version        : 10.1
--     cuDNN version       : 8.0.4
--     CUDA root directory : /usr/local/cuda
--     CUDA library        : /usr/local/cuda/lib64/stubs/libcuda.so
--     cudart library      : /usr/local/cuda/lib64/libcudart.so
--     cublas library      : /usr/lib/powerpc64le-linux-gnu/libcublas.so
--     cufft library       : /usr/local/cuda/lib64/libcufft.so
--     curand library      : /usr/local/cuda/lib64/libcurand.so
--     cuDNN library       : /usr/lib/powerpc64le-linux-gnu/libcudnn.so
--     nvrtc               : /usr/local/cuda/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda/include
--     NVCC executable     : /usr/local/cuda/bin/nvcc
--     NVCC flags          : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_70,code=sm_70;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_BFLOAT16_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
--     CUDA host compiler  : /usr/bin/cc
--     NVCC --device-c     : OFF
--     USE_TENSORRT        : OFF
--   USE_ROCM              : OFF
--   USE_EIGEN_FOR_BLAS    : ON
--   USE_FBGEMM            : OFF
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : OFF
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_MKL               : OFF
--   USE_MKLDNN            : OFF
--   USE_NCCL              : ON
--     USE_SYSTEM_NCCL     : OFF
--   USE_NNPACK            : OFF
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : OFF
--   USE_PYTORCH_QNNPACK   : OFF
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI             : ON
--     USE_GLOO            : ON
--     USE_TENSORPIPE      : ON

I don’t really have much experience with building PyTorch from source so I can only guess right now what the original problem is. My guess is that it relates to linking errors like these:

[3216/3432] Linking CXX executable bin/thread_init_test
FAILED: bin/thread_init_test
: && /usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -rdynamic -L/usr/lib -pthread caffe2/CMakeFiles/thread_init_test.dir/__/aten/src/ATen/test/thread_init_test.cpp.o -o bin/thread_init_test  -Wl,-rpath,/tmp/pip-req-build-omft2v9a/build/lib:/usr/local/cuda/lib64:  lib/libgtest_main.a  -Wl,--no-as-needed,"/tmp/pip-req-build-omft2v9a/build/lib/libtorch.so" -Wl,--as-needed  -Wl,--no-as-needed,"/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cpu.so" -Wl,--as-needed  lib/libprotobuf.a  -Wl,--no-as-needed,"/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so" -Wl,--as-needed  lib/libc10_cuda.so  lib/libc10.so  /usr/local/cuda/lib64/libcudart.so  /usr/local/cuda/lib64/libnvToolsExt.so  /usr/local/cuda/lib64/libcufft.so  /usr/local/cuda/lib64/libcurand.so  /usr/lib/powerpc64le-linux-gnu/libcublas.so  /usr/lib/powerpc64le-linux-gnu/libcudnn.so  lib/libgtest.a  -pthread && :
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseSpMM'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseSpMM_bufferSize'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseCreateDnMat'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseCreateCoo'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseDestroyDnMat'
/tmp/pip-req-build-omft2v9a/build/lib/libtorch_cuda.so: undefined reference to `cusparseDestroySpMat'
collect2: error: ld returned 1 exit status

The readme on GitHub mentions a similar kind of linking error and recommends to use at least Python 3.8.1. Unfortunately, 3.8.0 is the newest version for Power-PC in Ubuntu 18. Before building Python from source, I would like to confirm if the issue most likely lies in the Python version.

For full reference, I have uploaded the full pip log [1] though in case these errors are caused by something else. I would greatly appreciate if someone could help me with this issue.

[1] https://gist.github.com/lbhm/29f106addcf4769bf1bf5544ba1aec7e

I eventually fixed this problem by upgrading CUDA from 10.1.168 to >=10.1.243.

An excerpt from the CUDA release notes in case other PowerPC users stumble upon this problem:
“The cuSPARSE generic APIs are currently available only for Linux x86_64 (AMD64) systems. Using these APIs on any other systems will result in compile-time or run-time failures.” [1]

[1] https://docs.nvidia.com/cuda/archive/10.1/cuda-toolkit-release-notes/index.html#cuda-u1-libraries-known-issues