How to use half2 in customer kernels?

It looks like custom kernels use at::Half instead of half? And use of half itself doesnt seem to work? eg casts between half and float fail? I’m guessing these cast failures are something to do with the definition of CUDA_NO_HALF_OPERATORS and CUDA_NO_HALF2_OPERATORS in the compilation step?

Then, I figured that to use half2, I would use eg at::Half2, but this seems not to exist?

  • why do we use at::Half instead of half?
  • why are half casts etc removed?
  • what should we do if we want to use half2? (or, is there some way to get the compiler to handle half2 vectorization for us, just as g++ handles SSE vecotrization automatically for us in cpu code?)

Some questions are already answered in the cross post.

You should be able to use __half2 in your CUDA code as seen e.g. here.

sure, but there are no operators. Like eg for sigmoid I have to write:

__h2div(one, (__hadd2(one, h2exp(__hneg2(x)))))

instead of simply:

one / (one + hexp(-x))