Compiling 1.10.1 from source with GCC 11 and Cuda 11.5

When Compiling PyTorch 1.10.1 with GCC 11.2.0 and Cuda 11.5, I get these messages about cub:

[5447/6571] Linking CXX static library lib/libCaffe2_perfkernels_avx.a
[5448/6571] Linking CXX static library lib/libCaffe2_perfkernels_avx2.a
[5449/6571] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp.AVX512.cpp.o
[5450/6571] Linking CXX shared library lib/libtorch_cpu.so
[5451/6571] Linking CXX executable bin/FileStoreTest
[5452/6571] Linking CXX executable bin/HashStoreTest
[5453/6571] Linking CXX executable bin/TCPStoreTest
[5454/6571] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_SegmentReduce.cu.o
FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_SegmentReduce.cu.o /dev/shm/strube1/juwelsbooster/PyTorch/1.10.1/gcccoremkl-11.2.0-2021.4.0-CUDA-11.5/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_SegmentReduce.cu.o 
cd /dev/shm/strube1/juwelsbooster/PyTorch/1.10.1/gcccoremkl-11.2.0-2021.4.0-CUDA-11.5/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda && /p/software/juwelsbooster/stages/2022/software/CMake/3.21.1-GCCcore-11.2.0/bin/cmake -E make_directory /dev/shm/strube1/juwelsbooster/PyTorch/1.10.1/gcccoremkl-11.2.0-2021.4.0-CUDA-11.5/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/. && /p/software/juwelsbooster/stages/2022/software/CMake/3.21.1-GCCcore-11.2.0/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Release -D generated_file:STRING=/dev/shm/strube1/juwelsbooster/PyTorch/1.10.1/gcccoremkl-11.2.0-2021.4.0-CUDA-11.5/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_SegmentReduce.cu.o -D generated_cubin_file:STRING=/dev/shm/strube1/juwelsbooster/PyTorch/1.10.1/gcccoremkl-11.2.0-2021.4.0-CUDA-11.5/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_SegmentReduce.cu.o.cubin.txt -P /dev/shm/strube1/juwelsbooster/PyTorch/1.10.1/gcccoremkl-11.2.0-2021.4.0-CUDA-11.5/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_SegmentReduce.cu.o.Release.cmake
/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/block/../iterator/../util_device.cuh(150): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/block/../iterator/../util_device.cuh(156): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/block/../iterator/../util_device.cuh(435): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/block/../iterator/../util_device.cuh(512): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/block/../iterator/../util_device.cuh(573): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(364): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(413): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(450): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(451): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(455): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(478): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(479): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(499): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(503): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(518): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(564): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(600): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(601): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(607): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(616): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(617): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(626): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(666): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(672): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(677): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(678): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/util_allocator.cuh(694): error: namespace "cub" has no member "Debug"

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/device/dispatch/dispatch_scan.cuh(420): error: namespace "cub" has no member "Debug"
          detected during instantiation of "cudaError_t at::cuda::detail::cub::DeviceScan::InclusiveSum(void *, size_t &, InputIteratorT, OutputIteratorT, int, cudaStream_t, __nv_bool) [with InputIteratorT=int32_t *, OutputIteratorT=int32_t *]" 
/dev/shm/strube1/juwelsbooster/PyTorch/1.10.1/gcccoremkl-11.2.0-2021.4.0-CUDA-11.5/pytorch/aten/src/ATen/native/cuda/SegmentReduce.cu(54): here

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/device/dispatch/dispatch_scan.cuh(436): error: namespace "cub" has no member "Debug"
          detected during instantiation of "cudaError_t at::cuda::detail::cub::DeviceScan::InclusiveSum(void *, size_t &, InputIteratorT, OutputIteratorT, int, cudaStream_t, __nv_bool) [with InputIteratorT=int32_t *, OutputIteratorT=int32_t *]" 
/dev/shm/strube1/juwelsbooster/PyTorch/1.10.1/gcccoremkl-11.2.0-2021.4.0-CUDA-11.5/pytorch/aten/src/ATen/native/cuda/SegmentReduce.cu(54): here

/p/software/juwelsbooster/stages/2022/software/CUDA/11.5/include/cub/device/dispatch/dispatch_scan.cuh(282): error: namespace "cub" has no member "Debug"
          detected during:
            instantiation of "cudaError_t at::cuda::detail::cub::DispatchScan<InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT, SelectedPolicy>::Invoke<ActivePolicyT>() [with InputIteratorT=int32_t *, OutputIteratorT=int32_t *, ScanOpT=at::cuda::detail::cub::Sum, InitValueT=at::cuda::detail::cub::NullType, OffsetT=int, SelectedPolicy=at::cuda::detail::cub::DeviceScanPolicy<int32_t>, ActivePolicyT=at::cuda::detail::cub::DeviceScanPolicy<int32_t>::Policy350]" 
*emphasized text*

Ok, so I was informed by Facebook that PyTorch 1.10 is INCOMPATIBLE with Cuda 11.5.

They could’ve written this somewhere.

That’s wrong, as we are building PyTorch with CUDA11.5 and you can also download the nightly binaries with CUDA11.5 e.g. via:

pip3 install --pre torch -f https://download.pytorch.org/whl/nightly/cu115/torch_nightly.html

so I don’t know where this information is coming from.

EDIT: I was wrong, as 1.10.0 was cut before the 11.5 release (and 1.10.1 is a bug fix release) so you would need to use the latest nightly or install the wheels).

The Jülich Supercomputing Centre and many other HPC centers in the world use EasyBuild (https://easybuild.io/ ) to compile all software from source. Wheels are not an option.

I’ve backported cub (from CUDA 11.5] myself into 1.10.1 but I’m having problems with openmp being incorrectly set somewhere on PyTorch’s code now.

A timeframe for 1.11 would be most welcome.

I just hit this problem as well with Spack (https://spack.io). Does PyTorch keep track of supported CUDA versions for each release? So far I’ve found:

  • CUDA 7.5+ for older PyTorch
  • CUDA 9+ for PyTorch 1.1+
  • CUDA 9.2+ for PyTorch 1.6+

from CMake but I’ve never seen an upper bound. I’m guessing that CUDA is handled like Python where it’s assumed that new versions will be compatible but not guaranteed when CUDA/Python break backwards compatibility?

Anyway, I’ll add a constraint to our PyTorch build recipe that says that PyTorch 1.10 and older require CUDA 11.4 and older. Let me know if there are any other known constraints.