Hello community, now I’ve encountered a problem related to type casting.
In my python file, I want to call the function func
with:
a = torch.rand(size).to(torch.float16).to("cuda:0")
b = torch.rand(size).to(torch.float16).to("cuda:0")
func(a, b)
Where func
is defined in a .cu file as follow:
void func(const torch::Tensor input, torch::Tensor output);
Inside func
function, I will call a CUDA kernel:
__global__ void kernel(const half* _input, half* _output);
So, to implement func
, I need to do following things:
void func(const torch::Tensor input, torch::Tensor output) {
// This half type is defined in cuda_fp16.h
half* __input = reinterpret_cast<half*>(input.data_ptr<half>());
half* __output = reinterpret_cast<half*>(output.data_ptr<half>());
kernel<<<block, thread>>>(__input, __output);
cudaDeviceSynchronize();
}
But when I execute this, I encounter the error:
lilac.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8data_ptrI6__halfEEPT_v
I guess this is caused by the type casting, and I wonder which data type should I use in the .cu file, or how to change the tensor datatype in the py file.