Type casting problem when using torch in C++/CUDA

Hello community, now I’ve encountered a problem related to type casting.
In my python file, I want to call the function func with:

a = torch.rand(size).to(torch.float16).to("cuda:0")
b = torch.rand(size).to(torch.float16).to("cuda:0")
func(a, b)

Where func is defined in a .cu file as follow:

void func(const torch::Tensor input, torch::Tensor output);

Inside func function, I will call a CUDA kernel:

__global__ void kernel(const half* _input, half* _output);

So, to implement func, I need to do following things:

void func(const torch::Tensor input, torch::Tensor output) {
    // This half type is defined in cuda_fp16.h
    half* __input  = reinterpret_cast<half*>(input.data_ptr<half>());
    half* __output = reinterpret_cast<half*>(output.data_ptr<half>());
    kernel<<<block, thread>>>(__input, __output);
    cudaDeviceSynchronize();
}

But when I execute this, I encounter the error:

lilac.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8data_ptrI6__halfEEPT_v

I guess this is caused by the type casting, and I wonder which data type should I use in the .cu file, or how to change the tensor datatype in the py file.

Try to use at::Half instead.