How to use half2 in customer kernels?

Some questions are already answered in the cross post.

You should be able to use __half2 in your CUDA code as seen e.g. here.