Custom CUDA extension build fails for torch 1.6.0 or higher

I have a custom CUDA extension for PyTorch that used to work fine with pytorch1.4, CUDA10.1, and Titan Xp GPUs. However, recently we changed our system to new A40 GPUs and CUDA11.1. When I try to build my custom pytorch extension using CUDA11.1, pytorch 1.8.1, gcc 9.3.0, and Ubuntu 20.04 I get the following errors:

$ python3 setup.py install
running install
running bdist_egg
running egg_info
creating cuda_test.egg-info
writing cuda_test.egg-info/PKG-INFO
writing dependency_links to cuda_test.egg-info/dependency_links.txt
writing top-level names to cuda_test.egg-info/top_level.txt
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
reading manifest file 'cuda_test.egg-info/SOURCES.txt'
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'cuda_test' extension
creating /path/to/code/cuda/test/build
creating /path/to/code/cuda/test/build/temp.linux-x86_64-3.7
Emitting ninja build file /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o
/cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(256): error: identifier "FLT_MIN" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(274): error: identifier "DBL_MIN" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(190): error: identifier "DBL_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(228): error: identifier "DBL_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(243): error: identifier "DBL_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(293): error: identifier "DBL_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(406): error: identifier "DBL_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(498): error: identifier "DBL_MAX" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(562): error: identifier "DBL_MAX_EXP" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(565): error: identifier "DBL_MANT_DIG" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(630): error: identifier "DBL_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(119): error: identifier "FLT_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(137): error: identifier "FLT_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(147): error: identifier "FLT_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(170): error: identifier "FLT_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(249): error: identifier "FLT_EPSILON" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(327): error: identifier "FLT_MAX" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(375): error: identifier "FLT_MAX_EXP" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(377): error: identifier "FLT_MANT_DIG" is undefined

/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(420): error: identifier "FLT_EPSILON" is undefined

I also wrote a simple test code to verify that my more complex CPP/CUDA code isn’t the culprit, which produced the same error messages. I also checked if arithmetic.h and catrig.h include <cfloat>, which should provide the {FLT,DBL}_{MIN,MAX,EPSILON,MANT_DIG} definitions but this looks all fine and it’s standard NVIDIA code anyway.
Here are a couple of more things to note:

  1. The CUDA code compiles when I use CUDA10.1, pytorch 1.4.0, gcc 9.3.0, Ubuntu 20.04, and Titan Xp GPUs.
  2. Using pytorch 1.5.1 instead generates the following error:
    /usr/include/c++/9/bits/stl_function.h(437): error: identifier "__builtin_is_constant_evaluated" is undefined
    but this can be solved by downgrading gcc to version 7.5 or 8.4.
  3. Using pytorch 1.6.0 or higher instead always results in the errors shown above, even when using gcc-7 or gcc-8 and different GPUs.

To me this looks like a PyTorch bug. Tips and ideas on how to solve this problem are much appreciated.

Can you confirm whether the build issue persists when you are building upstream PyTorch without any modifications?

The problem is with compiling a CUDA extension for PyTorch similar to this tutorial Custom C++ and CUDA Extensions — PyTorch Tutorials 1.8.1+cu102 documentation but not with building PyTorch itself. Hope this answers the question or maybe I did not understand what is meant by modifications in this context.

Ok, I found the mistake! It is due to an issue with the installed Intel MKL lib, which is fixed now! Pretty hard to guess from the error messages thrown by the compiler. :face_with_monocle:

1 Like