How can I CUDA code to support mixed precision?

I have a CUDA kernel and I want to add FP16 support for it. Any idea where I should start?

What is your use case, i.e. would you like to perform the computation in FP16 or pseudo-FP16, i.e. FP32 math for FP16 inputs? Also, what kind of operations are you using inside your kernel?

Have a look at nvidia/apex for some use cases.

Thanks for your reply. I want to perform computation in FP16 directly. I have found the solution which is AT_DISPATCH_FLOATING_TYPES_AND(at::ScalarType::Half, ...), which seems to be better considering that DISPATCH_DOUBLE_FLOAT_AND_HALF is being deprecated.

1 Like