I installed NCCL 2.4.8 using the “O/S agnostic local installer” option from the NVIDIA website. This gave me a file nccl_2.4.8-1+cuda10.1_x86_64.txz
which I extracted into a new directory /opt/nccl-2.4.8
. I’m trying to compile PyTorch 1.4.1 (exactly @ git tag v1.4.1) now to use this NCCL installation, as I want it to be consistent within my Anaconda environment with other compiled applications there that use NCCL. As this is a server installation, I am also trying to make it possible to have multiple versions of CUDA, cuDNN, NCCL, TensorRT, etc in parallel, so all of them need to be totally local installs (e.g. no debs).
So far I can’t get cmake
to be happy that it has found NCCL properly. That is, it finds NCCL fine (both headers and library), but then fails to identify its version, and the ‘header matches library’ check fails too (I need to manually force the version identification to ‘succeed’ in order for cmake to get that far though). Does anyone have any experience with this?
Ubuntu 18.04, PyTorch 1.4.1, CUDA 10.1.243, cuDNN 7.6.5, CMake 3.10.2, NCCL 2.4.8
# Fresh compile...
rm -rf ~/Programs/PyTorch/pytorch/build/*
cd ~/Programs/PyTorch/pytorch
conda activate dl
source /usr/local/cuda-10.1/add_path.sh
source /opt/openmpi-2.1.1/add_path.sh
source /opt/nccl-2.4.8/add_path.sh
source /opt/TensorRT-6.0.1.5/add_path.sh
export CMAKE_PREFIX_PATH="${CONDA_PREFIX:-"$(dirname $(which conda))/../"}"
export CUDA_LIB_PATH=/usr/local/cuda-10.1/extras/system
export NCCL_ROOT_DIR=/opt/nccl-2.4.8
export USE_SYSTEM_NCCL=ON
export TENSORRT_ROOT=/opt/TensorRT-6.0.1.5
export BUILD_BINARY=ON
export BUILD_DOCS=ON
export USE_NCCL=ON
export USE_TENSORRT=ON
export USE_FFMPEG=ON
export USE_OPENMP=ON
export USE_OPENCV=ON
export USE_MKLDNN=ON
export USE_NNPACK=ON
export USE_GFLAGS=ON
export USE_GLOG=ON
export GPU_ARCH=75
$ env | grep PATH
LD_LIBRARY_PATH=/opt/TensorRT-6.0.1.5/lib:/opt/nccl-2.4.8/lib:/opt/openmpi-2.1.1/lib:/usr/local/cuda-10.1/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64:/usr/local/cuda-10.1/extras/system/lib64
CUDA_LIB_PATH=/usr/local/cuda-10.1/extras/system
CUDA_PATH=/usr/local/cuda-10.1
CMAKE_PREFIX_PATH=/home/escarda/anaconda3/envs/dl
PATH=/opt/TensorRT-6.0.1.5/bin:/opt/openmpi-2.1.1/bin:/usr/local/cuda-10.1/bin:/home/escarda/anaconda3/envs/dl/bin:/home/escarda/anaconda3/condabin:/usr/local/texlive/2019/bin/x86_64-linux:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
...
$ python setup.py build --cmake-only
Building wheel torch-1.4.0a0+7404463
-- Building version 1.4.0a0+7404463
cmake -GNinja -DBUILD_BINARY=ON -DBUILD_DOCS=ON -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/escarda/Programs/PyTorch/pytorch/torch -DCMAKE_PREFIX_PATH=/home/escarda/anaconda3/envs/dl -DNUMPY_INCLUDE_DIR=/home/escarda/anaconda3/envs/dl/lib/python3.6/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/home/escarda/anaconda3/envs/dl/bin/python -DPYTHON_INCLUDE_DIR=/home/escarda/anaconda3/envs/dl/include/python3.6m -DPYTHON_LIBRARY=/home/escarda/anaconda3/envs/dl/lib/libpython3.6m.so.1.0 -DTORCH_BUILD_VERSION=1.4.0a0+7404463 -DUSE_FFMPEG=ON -DUSE_GFLAGS=ON -DUSE_GLOG=ON -DUSE_MKLDNN=ON -DUSE_NCCL=ON -DUSE_NNPACK=ON -DUSE_NUMPY=True -DUSE_OPENCV=ON -DUSE_OPENMP=ON -DUSE_SYSTEM_NCCL=ON -DUSE_TENSORRT=ON /home/escarda/Programs/PyTorch/pytorch
-- The CXX compiler identification is GNU 7.5.0
-- The C compiler identification is GNU 7.5.0
...
-- Found CUDA: /usr/local/cuda-10.1 (found version "10.1")
-- Caffe2: CUDA detected: 10.1
-- Caffe2: CUDA nvcc is: /usr/local/cuda-10.1/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda-10.1
-- Caffe2: Header version is: 10.1
-- Found CUDNN: /usr/local/cuda-10.1/lib64/libcudnn.so
-- Found TENSORRT: /opt/TensorRT-6.0.1.5/include
-- Found cuDNN: v7.6.5 (include: /usr/local/cuda-10.1/include, library: /usr/local/cuda-10.1/lib64/libcudnn.so)
-- Autodetected CUDA architecture(s): 7.5 7.5 7.5 7.5
-- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
-- Found NCCL: /opt/nccl-2.4.8/include
-- Determining NCCL version from /opt/nccl-2.4.8/include/nccl.h...
-- Looking for NCCL_VERSION_CODE
-- Looking for NCCL_VERSION_CODE - not found
-- NCCL version < 2.3.5-5
-- Found NCCL (include: /opt/nccl-2.4.8/include, library: /opt/nccl-2.4.8/lib/libnccl.so)
-- Could NOT find CUB (missing: CUB_INCLUDE_DIR)
-- MPI include path: /opt/openmpi-2.1.1/include
-- MPI libraries: /opt/openmpi-2.1.1/lib/libmpi_cxx.so/opt/openmpi-2.1.1/lib/libmpi.so
-- Found CUDA: /usr/local/cuda-10.1 (found suitable version "10.1", minimum required is "7.0")
-- CUDA detected: 10.1
-- Could NOT find NCCL (missing: NCCL_INCLUDE_DIR)
CMake Warning at third_party/gloo/cmake/Dependencies.cmake:96 (message):
Not compiling with NCCL support. Suppress this warning with
-DUSE_NCCL=OFF.
Call Stack (most recent call first):
third_party/gloo/CMakeLists.txt:56 (include)
...
-- ******** Summary ********
-- General:
-- CMake version : 3.10.2
-- CMake command : /usr/bin/cmake
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- C++ compiler id : GNU
-- C++ compiler version : 7.5.0
-- BLAS : MKL
-- CXX flags : -fvisibility-inlines-hidden -fopenmp -DTENSORRT_VERSION_MAJOR=6 -DTENSORRT_VERSION_MINOR=0 -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow
-- Build type : Release
-- Compile definitions : TH_BLAS_MKL;ONNX_ML=1;ONNX_NAMESPACE=onnx_torch;MAGMA_V2;IDEEP_USE_MKL;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1
-- CMAKE_PREFIX_PATH : /home/escarda/anaconda3/envs/dl;/usr/local/cuda-10.1;/opt/nccl-2.4.8;/usr/local/cuda-10.1
-- CMAKE_INSTALL_PREFIX : /home/escarda/Programs/PyTorch/pytorch/torch
--
-- TORCH_VERSION : 1.4.0
-- CAFFE2_VERSION : 1.4.0
-- BUILD_CAFFE2_MOBILE : ON
-- USE_STATIC_DISPATCH : OFF
-- BUILD_BINARY : ON
-- BUILD_CUSTOM_PROTOBUF : ON
-- Link local protobuf : ON
-- BUILD_DOCS : ON
-- BUILD_PYTHON : True
-- Python version : 3.6.10
-- Python executable : /home/escarda/anaconda3/envs/dl/bin/python
-- Pythonlibs version : 3.6.10
-- Python library : /home/escarda/anaconda3/envs/dl/lib/libpython3.6m.so.1.0
-- Python includes : /home/escarda/anaconda3/envs/dl/include/python3.6m
-- Python site-packages: lib/python3.6/site-packages
-- BUILD_CAFFE2_OPS : ON
-- BUILD_SHARED_LIBS : ON
-- BUILD_TEST : True
-- BUILD_JNI : OFF
-- INTERN_BUILD_MOBILE :
-- USE_ASAN : OFF
-- USE_CUDA : ON
-- CUDA static link : OFF
-- USE_CUDNN : ON
-- CUDA version : 10.1
-- cuDNN version : 7.6.5
-- CUDA root directory : /usr/local/cuda-10.1
-- CUDA library : /usr/local/cuda-10.1/lib64/stubs/libcuda.so
-- cudart library : /usr/local/cuda-10.1/lib64/libcudart.so
-- cublas library : /usr/local/cuda-10.1/extras/system/lib64/libcublas.so
-- cufft library : /usr/local/cuda-10.1/lib64/libcufft.so
-- curand library : /usr/local/cuda-10.1/lib64/libcurand.so
-- cuDNN library : /usr/local/cuda-10.1/lib64/libcudnn.so
-- nvrtc : /usr/local/cuda-10.1/lib64/libnvrtc.so
-- CUDA include path : /usr/local/cuda-10.1/include
-- NVCC executable : /usr/local/cuda-10.1/bin/nvcc
-- CUDA host compiler : /usr/bin/cc
-- USE_TENSORRT : ON
-- TensorRT runtime library: /opt/TensorRT-6.0.1.5/lib/libnvinfer.so
-- TensorRT include path : /opt/TensorRT-6.0.1.5/include
-- USE_ROCM : OFF
-- USE_EIGEN_FOR_BLAS :
-- USE_FBGEMM : ON
-- USE_FFMPEG : ON
-- USE_GFLAGS : ON
-- USE_GLOG : ON
-- USE_LEVELDB : OFF
-- USE_LITE_PROTO : OFF
-- USE_LMDB : OFF
-- USE_METAL : OFF
-- USE_MKL : ON
-- USE_MKLDNN : ON
-- USE_MKLDNN_CBLAS : OFF
-- USE_NCCL : ON
-- USE_SYSTEM_NCCL : ON
-- USE_NNPACK : ON
-- USE_NUMPY : ON
-- USE_OBSERVERS : ON
-- USE_OPENCL : OFF
-- USE_OPENCV : ON
-- OpenCV version : 4.3.0
-- USE_OPENMP : ON
-- USE_TBB : OFF
-- USE_PROF : OFF
-- USE_QNNPACK : ON
-- USE_REDIS : OFF
-- USE_ROCKSDB : OFF
-- USE_ZMQ : OFF
-- USE_DISTRIBUTED : ON
-- USE_MPI : ON
-- USE_GLOO : ON
-- BUILD_NAMEDTENSOR : OFF
-- Public Dependencies : Threads::Threads;caffe2::mkl;glog::glog;caffe2::mkldnn
-- Private Dependencies : qnnpack;pytorch_qnnpack;nnpack;cpuinfo;fbgemm;/usr/lib/x86_64-linux-gnu/libnuma.so;opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs;opencv_optflow;opencv_videoio;opencv_video;/usr/lib/x86_64-linux-gnu/libavcodec.so;/usr/lib/x86_64-linux-gnu/libavformat.so;/usr/lib/x86_64-linux-gnu/libavutil.so;/usr/lib/x86_64-linux-gnu/libswscale.so;fp16;/opt/openmpi-2.1.1/lib/libmpi_cxx.so;/opt/openmpi-2.1.1/lib/libmpi.so;gloo;aten_op_header_gen;foxi_loader;rt;gcc_s;gcc;dl
-- Configuring done
CMake Warning (dev) at cmake/Dependencies.cmake:1067 (add_dependencies):
Policy CMP0046 is not set: Error on non-existent dependency in
add_dependencies. Run "cmake --help-policy CMP0046" for policy details.
Use the cmake_policy command to set the policy and suppress this warning.
The dependency target "nccl_external" of target "gloo_cuda" does not exist.
Call Stack (most recent call first):
CMakeLists.txt:380 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
...
There are some slightly non-standard things in there due to CUDA, cuDNN, TensorRT etc being local installs, but these installs are all found and used fine. The correct NCCL header is found (/opt/nccl-2.4.8/include/nccl.h
) but NCCL_VERSION_CODE
extraction fails, despite the fact that the following line is contained in the file:
#define NCCL_VERSION_CODE 2408
The correct library is also found (/opt/nccl-2.4.8/lib/libnccl.so
), so why is cmake getting confused about the version? Any ideas?