I am trying to build PyTorch 1.9.0 from source on a POWER8 machine with CUDA 11.5 and Python 3.8 compatibility. As far as I understand there are no binaries/build configurations for this setup so I have been trying to find a workaround. My approach is to follow the same guidelines as in the From Source
section on the Pytorch repository, except I run git checkout tags/v1.9.0
prior to syncing/updating the submodules. Then I create a conda environment and run:
$ conda install numpy ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses
Note that I do not install mkl
as I am building on a ppc64le architecture. I then have to separately install magma
via the compass
channel, which has a CUDA 11.2 compatible version. Then I export the following environment variables:
export PATH=/usr/local/cuda-11.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.5/lib64:$LD_LIBRARY_PATH
export CC=/usr/bin/gcc
export CXX=/usr/bin/g++
export USE_CUDA="True"
If I try to run the setup.py
script at this point I get multiple errors about cub
namespace bugs and conflicts with thrust
, which seem to be addressed in later commits. To address these I followed the updates made in this commit, namely by creating the caffe2/utils/cub_namespace.cuh
script and adjusting the include statements within the relevant caffe2
scripts, as well as adjusting the cmake/Dependencies.cmake
.
After updating the git submodules, I run the setup script:
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
BUILD_TEST=0 USE_SYSTEM_NCCL=1 python setup.py install
This appears to address the aforementioned issues and the build almost completes but around step 3080/3100 I get the following error:
FAILED: bin/torch_shm_manager
: && /usr/bin/g++ -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -DHAVE_VSX_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic -rdynamic caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/manager.cpp.o -o bin/torch_shm_manager -Wl,-rpath,/home/mac/pytorch/build/lib:/home/mac/miniconda3/envs/aml_env/lib:/usr/local/cuda-11.5/lib64:/usr/local/cuda-11.5/lib: lib/libshm.so -lrt lib/libtorch.so -Wl,--no-as-needed,"/home/mac/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed lib/libprotobuf.a -pthread -Wl,--no-as-needed,"/home/mac/pytorch/build/lib/libtorch_cuda.so" -Wl,--as-needed lib/libc10_cuda.so /usr/local/cuda-11.5/lib64/libcudart.so /home/mac/miniconda3/envs/aml_env/lib/libnvToolsExt.so /usr/local/cuda-11.5/lib64/libcufft.so /usr/local/cuda-11.5/lib64/libcurand.so /usr/local/cuda-11.5/lib64/libcublas.so /usr/local/cuda-11.5/lib/libcudnn.so lib/libc10.so && :
/usr/local/cuda-11.5/lib64/libcublas.so: undefined reference to `cublasLtGetStatusString@libcublasLt.so.11'
/usr/local/cuda-11.5/lib64/libcublas.so: undefined reference to `cublasLtGetStatusName@libcublasLt.so.11'
collect2: error: ld returned 1 exit status
[3084/3110] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions.cpp.o
[3085/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_group_spatial_softmax_op.cu.o
[3086/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_smooth_l1_loss_op.cu.o
[3087/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_ps_roi_pool_op.cu.o
[3088/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_upsample_nearest_op.cu.o
[3089/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_sigmoid_focal_loss_op.cu.o
[3090/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_select_smooth_l1_loss_op.cu.o
[3091/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_sample_as_op.cu.o
[3092/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_roi_pool_f_op.cu.o
[3093/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_spatial_narrow_as_op.cu.o
[3094/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_softmax_focal_loss_op.cu.o
[3095/3110] Building NVCC (Device) object modules/detectron/CMakeFiles/caffe2_detectron_ops_gpu.dir/caffe2_detectron_ops_gpu_generated_sigmoid_cross_entropy_loss_op.cu.o
[3096/3110] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions.cpp.o
ninja: build stopped: subcommand failed.
Building wheel torch-1.9.0a0+gitd69c22d
-- Building version 1.9.0a0+gitd69c22d
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=False -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/mac/pytorch/torch -DCMAKE_PREFIX_PATH=/home/mac/miniconda3/envs/aml_env -DNUMPY_INCLUDE_DIR=/home/mac/miniconda3/envs/aml_env/lib/python3.8/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/home/mac/miniconda3/envs/aml_env/bin/python -DPYTHON_INCLUDE_DIR=/home/mac/miniconda3/envs/aml_env/include/python3.8 -DPYTHON_LIBRARY=/home/mac/miniconda3/envs/aml_env/lib/libpython3.8.so.1.0 -DTORCH_BUILD_VERSION=1.9.0a0+gitd69c22d -DUSE_CUDA=True -DUSE_NUMPY=True -DUSE_SYSTEM_NCCL=1 /home/mac/pytorch
cmake --build . --target install --config Release -- -j 160
Here is an overview of the environment variables and locations/ versions of relevant build tools/ libraries:
-- ******** Summary ********
-- General:
-- CMake version : 3.19.6
-- CMake command : /home/mac/miniconda3/envs/aml_env/bin/cmake
-- System : Linux
-- C++ compiler : /usr/bin/g++
-- C++ compiler id : GNU
-- C++ compiler version : 8.5.0
-- Using ccache if found : ON
-- Found ccache : CCACHE_PROGRAM-NOTFOUND
-- CXX flags : -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow
-- Build type : Release
-- Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;MAGMA_V2;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
-- CMAKE_PREFIX_PATH : /home/mac/miniconda3/envs/aml_env;/usr/local/cuda-11.5;/usr/local/cuda-11.5
-- CMAKE_INSTALL_PREFIX : /home/mac/pytorch/torch
-- USE_GOLD_LINKER : OFF
--
-- TORCH_VERSION : 1.9.0
-- CAFFE2_VERSION : 1.9.0
-- BUILD_CAFFE2 : ON
-- BUILD_CAFFE2_OPS : ON
-- BUILD_CAFFE2_MOBILE : OFF
-- BUILD_STATIC_RUNTIME_BENCHMARK: OFF
-- BUILD_TENSOREXPR_BENCHMARK: OFF
-- BUILD_BINARY : OFF
-- BUILD_CUSTOM_PROTOBUF : ON
-- Link local protobuf : ON
-- BUILD_DOCS : OFF
-- BUILD_PYTHON : True
-- Python version : 3.8.12
-- Python executable : /home/mac/miniconda3/envs/aml_env/bin/python
-- Pythonlibs version : 3.8.12
-- Python library : /home/mac/miniconda3/envs/aml_env/lib/libpython3.8.so.1.0
-- Python includes : /home/mac/miniconda3/envs/aml_env/include/python3.8
-- Python site-packages: lib/python3.8/site-packages
-- BUILD_SHARED_LIBS : ON
-- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF
-- BUILD_TEST : False
-- BUILD_JNI : OFF
-- BUILD_MOBILE_AUTOGRAD : OFF
-- BUILD_LITE_INTERPRETER: OFF
-- INTERN_BUILD_MOBILE :
-- USE_BLAS : 1
-- BLAS : open
-- USE_LAPACK : 1
-- LAPACK : open
-- USE_ASAN : OFF
-- USE_CPP_CODE_COVERAGE : OFF
-- USE_CUDA : True
-- Split CUDA : OFF
-- CUDA static link : OFF
-- USE_CUDNN : ON
-- CUDA version : 11.5
-- cuDNN version : 8.3.1
-- CUDA root directory : /usr/local/cuda-11.5
-- CUDA library : /usr/local/cuda-11.5/lib64/stubs/libcuda.so
-- cudart library : /usr/local/cuda-11.5/lib64/libcudart.so
-- cublas library : /usr/local/cuda-11.5/lib64/libcublas.so
-- cufft library : /usr/local/cuda-11.5/lib64/libcufft.so
-- curand library : /usr/local/cuda-11.5/lib64/libcurand.so
-- cuDNN library : /usr/local/cuda-11.5/lib/libcudnn.so
-- nvrtc : /home/mac/miniconda3/envs/aml_env/lib/libnvrtc.so
-- CUDA include path : /usr/local/cuda-11.5/include
-- NVCC executable : /usr/local/cuda-11.5/bin/nvcc
-- NVCC flags : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_60,code=sm_60;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__;-Xcompiler;-fPIC
-- CUDA host compiler : /usr/bin/gcc
-- NVCC --device-c : OFF
-- USE_TENSORRT : OFF
-- USE_ROCM : OFF
-- USE_EIGEN_FOR_BLAS : ON
-- USE_FBGEMM : OFF
-- USE_FAKELOWP : OFF
-- USE_KINETO : ON
-- USE_FFMPEG : OFF
-- USE_GFLAGS : OFF
-- USE_GLOG : OFF
-- USE_LEVELDB : OFF
-- USE_LITE_PROTO : OFF
-- USE_LMDB : OFF
-- USE_METAL : OFF
-- USE_PYTORCH_METAL : OFF
-- USE_FFTW : OFF
-- USE_MKL : OFF
-- USE_MKLDNN : OFF
-- USE_NCCL : ON
-- USE_SYSTEM_NCCL : 1
-- USE_NNPACK : OFF
-- USE_NUMPY : ON
-- USE_OBSERVERS : ON
-- USE_OPENCL : OFF
-- USE_OPENCV : OFF
-- USE_OPENMP : ON
-- USE_TBB : OFF
-- USE_VULKAN : OFF
-- USE_PROF : OFF
-- USE_QNNPACK : OFF
-- USE_PYTORCH_QNNPACK : OFF
-- USE_REDIS : OFF
-- USE_ROCKSDB : OFF
-- USE_ZMQ : OFF
-- USE_DISTRIBUTED : ON
-- USE_MPI : OFF
-- USE_GLOO : ON
-- USE_TENSORPIPE : ON
-- USE_DEPLOY : OFF
-- Public Dependencies : Threads::Threads
-- Private Dependencies : cpuinfo;fp16;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
Any idea on what the problem could be? Thanks in advance!