It looks like custom kernels use at::Half instead of half? And use of half itself doesnt seem to work? eg casts between half and float fail? I’m guessing these cast failures are something to do with the definition of CUDA_NO_HALF_OPERATORS and CUDA_NO_HALF2_OPERATORS in the compilation step?
Then, I figured that to use half2, I would use eg at::Half2, but this seems not to exist?
- why do we use at::Half instead of half?
- why are half casts etc removed?
- what should we do if we want to use half2? (or, is there some way to get the compiler to handle half2 vectorization for us, just as g++ handles SSE vecotrization automatically for us in cpu code?)