I’m trying to write my own CUDA kernel but I’m not able to compile it using CPP extension. I am following the guide here, and I was able to get a C++ version working fine. Full code and error messages below:
global void linear_forward_cuda( float * input, float * weight, float * bias, float * output){
const int tid = threadIdx.x;
const int bid = blockIdx.x;
const int m = blockDim.x;shared float shared_sum[1024];
shared_sum[tid] = weight[(bid*m) + tid] * input[tid];for (unsigned int i = m/2; i > 0 ; i = i>>1){
if (tid < i){
shared_sum[tid] += shared_sum[tid+i];
}
__syncthreads();
}if (tid==0) {
output[bid] = shared_sum[0] + bias[bid];
}
}
In function ‘at::Tensor linear_forward(at::Tensor, at::Tensor, at::Tensor)’:
/home/karthik/projects/torch_files/approx-mult/cuda/cuda_layers.cpp:58:26: error: expected primary-expression before ‘<’ token
58 | linear_forward_cuda<<<threads,blocks>>>(input, weight_transposed, bias, output);
I am calling this kernel from another .CPP file. And I am adding them both to the setup.py file as shown in the guide I linked above.
This kernel works just fine when I call it via another .cu file and compile that file using just NVCC. My guess is Ninja is trying to compile this file using gcc instead of using nvcc? Since this is all handled by CPP extension, I’m not sure how to fix this. Any help would be much appreciated.
EDIT: I tried renaming both files to .cu to use the NVCC compiler for both and it seems to work. But I’m not sure if that it the right way to fix this.