Convert fp32 tensor to fp16 in cuda kernel

I had a torchscript model with fp16 precision, so I must feed fp16 data to the model to do inference; I convert a fp32 image to fp16 in a cuda kernel,I use the “__float2half()” function to do the conversion,but “__float2half()” has more than one version,such as “__float2half”,"__float2half_rd","__float2half_rz",…,so which version should I use?

Assuming you don’t want or cannot transform the tensors to torch.float16 and want to use the CUDA ops, it would depend on the rounding mode you want to apply.