__CUDA_NO_HALF2_OPERATORS__ for CUDA 9.2

Just some random thoughts:

I just tried to build PyTorch on the recently released CUDA 9.2, and had some weird compiler error, such as:

/usr/local/cuda-9.2/include/cuda_fp16.hpp(299): error: no operator "&&" matches these operands
            operand types are: __half && __half
                                                                                                                       
/usr/local/cuda-9.2/include/cuda_fp16.hpp(300): error: no operator "&&" matches these operands
            operand types are: __half && __half
                                                                                                                       
/usr/local/cuda-9.2/include/cuda_fp16.hpp(301): error: no operator "&&" matches these operands
            operand types are: __half && __half
                                                                                                                       
/usr/local/cuda-9.2/include/cuda_fp16.hpp(302): error: no operator "&&" matches these operands
            operand types are: __half && __half
                                                                                                                       
/usr/local/cuda-9.2/include/cuda_fp16.hpp(303): error: no operator "&&" matches these operands
            operand types are: __half && __half
                                                                                                                       
/usr/local/cuda-9.2/include/cuda_fp16.hpp(304): error: no operator "&&" matches these operands
            operand types are: __half && __half
                                                                                                                       
6 errors detected in the compilation of "/tmp/tmpxft_000040bf_00000000-6_THCReduceApplyUtils.cpp1.ii".
CMake Error at ATen_cuda_generated_THCReduceApplyUtils.cu.o.Release.cmake:279 (message):
  Error generating file
  /.../aten/build/src/ATen/CMakeFiles/ATen_cuda.dir/__/THC/./ATen_cuda_generated_THCReduceApplyUtils.cu.o

After some tinkering, I found that the build succeeds if I add __CUDA_NO_HALF2_OPERATORS__:

diff --git a/aten/CMakeLists.txt b/aten/CMakeLists.txt
index bdf3145..7620d23 100644
--- a/aten/CMakeLists.txt
+++ b/aten/CMakeLists.txt
@@ -165,7 +165,7 @@ ENDIF()
 
 IF(CUDA_HAS_FP16 OR NOT ${CUDA_VERSION} LESS 7.5)
   MESSAGE(STATUS "Found CUDA with FP16 support, compiling with torch.CudaHalfTensor")
-  LIST(APPEND CUDA_NVCC_FLAGS "-DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__")
+  LIST(APPEND CUDA_NVCC_FLAGS "-DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__")
   add_compile_options(-DCUDA_HAS_FP16=1)
 ELSE(CUDA_HAS_FP16 OR NOT ${CUDA_VERSION} LESS 7.5)
   MESSAGE(STATUS "Could not find CUDA with FP16 support, compiling without torch.CudaHalfTensor")

I don’t know if it applies to others, but if anybody’s trying to build with CUDA 9.2 and having problems, this might help.

  • BTW, I didn’t know 9.2 was released three days ago! No wonder I had a problem with the build… :slight_smile:
1 Like

It’s fixed on master by @ngimel!

question: why are defining symbols such as CUDA_NO_HALF2_OPERATORS, which seem to remove the possibility of doing casts between float16 and float32? Am I misunderstanding the effects of these defines? In any case, if I try using half instead of at::Half, in custom torch extension kernels, I get weird casting errors. I’m trying to understand what is the background and context for this?

We are using these flags to use the internal PyTorch half operations instead of the one from the CUDA libraries.

This dates quite a while back, so I might miss some things but If I remember it correctly, CUDA9 added half operators in its half header, while Torch (Torch7 at this time) already shipped with its own.
The flags are used to keep the half definitions from the CUDA header, while not compiling the operators.

What kind of issues are you seeing in your custom CUDA extension?

EDIT: follow-up question seems to be in this post.