I have a custom CUDA extension for PyTorch that used to work fine with pytorch1.4, CUDA10.1, and Titan Xp GPUs. However, recently we changed our system to new A40 GPUs and CUDA11.1. When I try to build my custom pytorch extension using CUDA11.1, pytorch 1.8.1, gcc 9.3.0, and Ubuntu 20.04 I get the following errors:
$ python3 setup.py install
running install
running bdist_egg
running egg_info
creating cuda_test.egg-info
writing cuda_test.egg-info/PKG-INFO
writing dependency_links to cuda_test.egg-info/dependency_links.txt
writing top-level names to cuda_test.egg-info/top_level.txt
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
reading manifest file 'cuda_test.egg-info/SOURCES.txt'
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'cuda_test' extension
creating /path/to/code/cuda/test/build
creating /path/to/code/cuda/test/build/temp.linux-x86_64-3.7
Emitting ninja build file /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o
/cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(256): error: identifier "FLT_MIN" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(274): error: identifier "DBL_MIN" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(190): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(228): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(243): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(293): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(406): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(498): error: identifier "DBL_MAX" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(562): error: identifier "DBL_MAX_EXP" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(565): error: identifier "DBL_MANT_DIG" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(630): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(119): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(137): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(147): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(170): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(249): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(327): error: identifier "FLT_MAX" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(375): error: identifier "FLT_MAX_EXP" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(377): error: identifier "FLT_MANT_DIG" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(420): error: identifier "FLT_EPSILON" is undefined
I also wrote a simple test code to verify that my more complex CPP/CUDA code isn’t the culprit, which produced the same error messages. I also checked if arithmetic.h and catrig.h include <cfloat>, which should provide the {FLT,DBL}_{MIN,MAX,EPSILON,MANT_DIG} definitions but this looks all fine and it’s standard NVIDIA code anyway.
Here are a couple of more things to note:
- The CUDA code compiles when I use CUDA10.1, pytorch 1.4.0, gcc 9.3.0, Ubuntu 20.04, and Titan Xp GPUs.
- Using pytorch 1.5.1 instead generates the following error:
/usr/include/c++/9/bits/stl_function.h(437): error: identifier "__builtin_is_constant_evaluated" is undefined
but this can be solved by downgrading gcc to version 7.5 or 8.4. - Using pytorch 1.6.0 or higher instead always results in the errors shown above, even when using gcc-7 or gcc-8 and different GPUs.
To me this looks like a PyTorch bug. Tips and ideas on how to solve this problem are much appreciated.