Trouble building from source with CUDA

I have been meaning to contribute to pytorch for a while, and have done so without CUDA.

USE_CUDA=0 python setup.py develop

works for me.
But when I try to install with CUDA 11.5, NCCL, GLOO, and eventually CAFFE2 (if I disable the others) end up throwing an error. The following command

 python setup.py develop

System: Ubuntu 22.04 on MSI GV62.

on the system above throws the following error

[2841/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-xop-ld128.c.o
[2842/6918] Performing build step for 'nccl_external'

FAILED: nccl_external-prefix/src/nccl_external-stamp/nccl_external-build nccl/lib/libnccl_static.a 
cd /home/siddharth/pytorch/third_party/nccl/nccl && make -j6 -l6 CXX=/usr/bin/c++ CUDA_HOME=/usr NVCC=/usr/bin/nvcc "NVCC_GENCODE=-gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86" BUILDDIR=/home/siddharth/pytorch/build/nccl VERBOSE=0 && /home/siddharth/anaconda3/bin/cmake -E touch /home/siddharth/pytorch/build/nccl_external-prefix/src/nccl_external-stamp/nccl_external-build
make -C src build BUILDDIR=/home/siddharth/pytorch/build/nccl
make[1]: Entering directory '/home/siddharth/pytorch/third_party/nccl/nccl/src'
NVCC_GENCODE is -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86
Grabbing   include/nccl_net.h                  > /home/siddharth/pytorch/build/nccl/include/nccl_net.h
Generating nccl.h.in                           > /home/siddharth/pytorch/build/nccl/include/nccl.h
Generating nccl.pc.in                          > /home/siddharth/pytorch/build/nccl/lib/pkgconfig/nccl.pc
Compiling  init.cc                             > /home/siddharth/pytorch/build/nccl/obj/init.o
Compiling  init_nvtx.cc                        > /home/siddharth/pytorch/build/nccl/obj/init_nvtx.o
Compiling  channel.cc                          > /home/siddharth/pytorch/build/nccl/obj/channel.o
Compiling  bootstrap.cc                        > /home/siddharth/pytorch/build/nccl/obj/bootstrap.o
Compiling  transport.cc                        > /home/siddharth/pytorch/build/nccl/obj/transport.o
Compiling  enqueue.cc                          > /home/siddharth/pytorch/build/nccl/obj/enqueue.o
Compiling  group.cc                            > /home/siddharth/pytorch/build/nccl/obj/group.o
Compiling  debug.cc                            > /home/siddharth/pytorch/build/nccl/obj/debug.o
Compiling  proxy.cc                            > /home/siddharth/pytorch/build/nccl/obj/proxy.o
Compiling  net.cc                              > /home/siddharth/pytorch/build/nccl/obj/net.o
Compiling  misc/cudawrap.cc                    > /home/siddharth/pytorch/build/nccl/obj/misc/cudawrap.o
Compiling  misc/nvmlwrap.cc                    > /home/siddharth/pytorch/build/nccl/obj/misc/nvmlwrap.o
Compiling  misc/ibvwrap.cc                     > /home/siddharth/pytorch/build/nccl/obj/misc/ibvwrap.o
Compiling  misc/gdrwrap.cc                     > /home/siddharth/pytorch/build/nccl/obj/misc/gdrwrap.o
Compiling  misc/utils.cc                       > /home/siddharth/pytorch/build/nccl/obj/misc/utils.o
Compiling  misc/argcheck.cc                    > /home/siddharth/pytorch/build/nccl/obj/misc/argcheck.o
Compiling  misc/socket.cc                      > /home/siddharth/pytorch/build/nccl/obj/misc/socket.o
Compiling  misc/shmutils.cc                    > /home/siddharth/pytorch/build/nccl/obj/misc/shmutils.o
Compiling  misc/profiler.cc                    > /home/siddharth/pytorch/build/nccl/obj/misc/profiler.o
Compiling  misc/param.cc                       > /home/siddharth/pytorch/build/nccl/obj/misc/param.o
Compiling  misc/strongstream.cc                > /home/siddharth/pytorch/build/nccl/obj/misc/strongstream.o
Compiling  transport/net_socket.cc             > /home/siddharth/pytorch/build/nccl/obj/transport/net_socket.o
Compiling  transport/net_ib.cc                 > /home/siddharth/pytorch/build/nccl/obj/transport/net_ib.o
Compiling  transport/coll_net.cc               > /home/siddharth/pytorch/build/nccl/obj/transport/coll_net.o
Compiling  transport/nvls.cc                   > /home/siddharth/pytorch/build/nccl/obj/transport/nvls.o
Compiling  collectives/sendrecv.cc             > /home/siddharth/pytorch/build/nccl/obj/collectives/sendrecv.o
Compiling  collectives/all_reduce.cc           > /home/siddharth/pytorch/build/nccl/obj/collectives/all_reduce.o
Compiling  collectives/all_gather.cc           > /home/siddharth/pytorch/build/nccl/obj/collectives/all_gather.o
Compiling  collectives/broadcast.cc            > /home/siddharth/pytorch/build/nccl/obj/collectives/broadcast.o
Compiling  collectives/reduce.cc               > /home/siddharth/pytorch/build/nccl/obj/collectives/reduce.o
Compiling  collectives/reduce_scatter.cc       > /home/siddharth/pytorch/build/nccl/obj/collectives/reduce_scatter.o
Compiling  graph/topo.cc                       > /home/siddharth/pytorch/build/nccl/obj/graph/topo.o
Compiling  graph/paths.cc                      > /home/siddharth/pytorch/build/nccl/obj/graph/paths.o
Compiling  graph/search.cc                     > /home/siddharth/pytorch/build/nccl/obj/graph/search.o
Compiling  graph/connect.cc                    > /home/siddharth/pytorch/build/nccl/obj/graph/connect.o
Compiling  graph/rings.cc                      > /home/siddharth/pytorch/build/nccl/obj/graph/rings.o
Compiling  graph/trees.cc                      > /home/siddharth/pytorch/build/nccl/obj/graph/trees.o
Compiling  graph/tuning.cc                     > /home/siddharth/pytorch/build/nccl/obj/graph/tuning.o
Compiling  graph/xml.cc                        > /home/siddharth/pytorch/build/nccl/obj/graph/xml.o
Compiling  enhcompat.cc                        > /home/siddharth/pytorch/build/nccl/obj/enhcompat.o
make[2]: Entering directory '/home/siddharth/pytorch/third_party/nccl/nccl/src/collectives/device'
NVCC_GENCODE is -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86
Generating rules                               > /home/siddharth/pytorch/build/nccl/obj/collectives/device/Makefile.rules
NVCC_GENCODE is -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_i8.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_u8.cu
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_i32.cu
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_u32.cu
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_i64.cu
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_u64.cu
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_f16.cu
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_f32.cu
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_f64.cu
Copying    sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_bf16.cu
[Repeated a few more times]
Copying    all_reduce.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_reduce_sum_i8.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Copying    all_reduce.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_reduce_sum_u8.cu
Copying    all_reduce.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_reduce_sum_i32.cu
Copying    all_reduce.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_reduce_sum_u32.cu
Copying    all_reduce.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_reduce_sum_i64.cu
[Repeated a few more times]
Copying    all_gather.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_gather_sum_i8.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Copying    all_gather.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_gather_sum_u8.cu
Copying    all_gather.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_gather_sum_i32.cu
Copying    all_gather.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_gather_sum_u32.cu
Copying    all_gather.cu                       > /home/siddharth/pytorch/build/nccl/obj/collectives/device/all_gather_sum_i64.cu
[Repeated a few more times]
Copying    broadcast.cu                        > /home/siddharth/pytorch/build/nccl/obj/collectives/device/broadcast_sum_i8.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Copying    broadcast.cu                        > /home/siddharth/pytorch/build/nccl/obj/collectives/device/broadcast_sum_u8.cu
Copying    broadcast.cu                        > /home/siddharth/pytorch/build/nccl/obj/collectives/device/broadcast_sum_i32.cu
Copying    broadcast.cu                        > /home/siddharth/pytorch/build/nccl/obj/collectives/device/broadcast_sum_u32.cu
[Repeated a few more times]
Copying    reduce.cu                           > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_sum_i8.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Copying    reduce.cu                           > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_sum_u8.cu
Copying    reduce.cu                           > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_sum_i32.cu
Copying    reduce.cu                           > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_sum_u32.cu
[Repeated a few more times]
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Copying    reduce_scatter.cu                   > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_scatter_sum_u8.cu
Copying    reduce_scatter.cu                   > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_scatter_sum_i32.cu
Copying    reduce_scatter.cu                   > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_scatter_sum_u32.cu
Copying    reduce_scatter.cu                   > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_scatter_sum_i64.cu
Copying    reduce_scatter.cu                   > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_scatter_sum_u64.cu
Copying    reduce_scatter.cu                   > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_scatter_sum_f16.cu
Copying    reduce_scatter.cu                   > /home/siddharth/pytorch/build/nccl/obj/collectives/device/reduce_scatter_sum_f32.cu
[Repeated a few more times]
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Compiling  sendrecv.cu                         > /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_i8.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
make[2]: *** [/home/siddharth/pytorch/build/nccl/obj/collectives/device/Makefile.rules:8: /home/siddharth/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_i8.o] Error 1
make[2]: Leaving directory '/home/siddharth/pytorch/third_party/nccl/nccl/src/collectives/device'
make[1]: *** [Makefile:58: /home/siddharth/pytorch/build/nccl/obj/collectives/device/colldevice.a] Error 2
make[1]: Leaving directory '/home/siddharth/pytorch/third_party/nccl/nccl/src'
make: *** [Makefile:25: src.build] Error 2
[2843/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-xop-ld64.c.o
[2844/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-xop-ld64.c.o
[2845/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-xop-ld128.c.o
[2846/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-xop-ld64.c.o
[2847/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-xop-ld128.c.o
[2848/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-xop-ld64.c.o
[2849/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-xop-ld64.c.o
[2850/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-xop-ld128.c.o
[2851/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-xop-ld128.c.o
[2852/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-xop-ld128.c.o
[2853/6918] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-xop-ld64.c.o
[2854/6918] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8Depthwise3DAvx2.cc.o
[2855/6918] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8DepthwiseAvx2.cc.o
ninja: build stopped: subcommand failed.
Building wheel torch-2.1.0a0+git6c5fdde
-- Building version 2.1.0a0+git6c5fdde
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/siddharth/pytorch/torch -DCMAKE_PREFIX_PATH=/home/siddharth/anaconda3/lib/python3.8/site-packages;/home/siddharth/anaconda3 -DNUMPY_INCLUDE_DIR=/home/siddharth/anaconda3/lib/python3.8/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/home/siddharth/anaconda3/bin/python -DPYTHON_INCLUDE_DIR=/home/siddharth/anaconda3/include/python3.8 -DPYTHON_LIBRARY=/home/siddharth/anaconda3/lib/libpython3.8.so.1.0 -DTORCH_BUILD_VERSION=2.1.0a0+git6c5fdde -DUSE_NUMPY=True /home/siddharth/pytorch
cmake --build . --target install --config Release```
1 Like

Your GCC11 compiler seems to be too new for the used NCCL version and you might need to downgrade it to GCC10 as described here.
If you are not directly working on the distributed backend and don’t need to run distributed tests you could also skip the build of NCCL and the distributed namespace.

1 Like

First, install gcc-10 & g++-10 globally (Conda binaries won’t be able to do the replacement),
then set environment variables to execute the compilation command like:

export CMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-10
CC=gcc-10 CXX=g++-10 python setup.py develop
1 Like