I have a CUDA kernel and I want to add FP16 support for it. Any idea where I should start?
What is your use case, i.e. would you like to perform the computation in FP16 or pseudo-FP16, i.e. FP32 math for FP16 inputs? Also, what kind of operations are you using inside your kernel?
Have a look at nvidia/apex for some use cases.
Thanks for your reply. I want to perform computation in FP16 directly. I have found the solution which is
AT_DISPATCH_FLOATING_TYPES_AND(at::ScalarType::Half, ...), which seems to be better considering that
DISPATCH_DOUBLE_FLOAT_AND_HALF is being deprecated.