Passing multidimensional ATen Tensor to cuda kernel and access it's elements

Error: expression must have pointer-to-object type

I am trying to write my own CUDA function for pytorch according to @goldsborough tutorial: https://pytorch.org/tutorials/advanced/cpp_extension.html

I have to pass a 2D tensor out_attribute to the kernel and access it’s values element wise:

template <typename scalar_t>
__global__ void sort_in_grid_forward_kernel(
   const scalar_t* __restrict__ point_cloud,
   scalar_t* __restrict__ out_attributes) {

 const int i = (blockIdx.x * blockDim.x) + threadIdx.x;
 const int j = (blockIdx.y * blockDim.y) + threadIdx.y;
 // const int index = j*num_points +i;


 if(i == j){
     out_attributes[i][j]= -1;
 }
 else{
    // ........... Do some processing with point cloud ....... //
         out_attributes[i][j]= value;
     }
}


at::Tensor sort_in_grid_forward(
   at::Tensor point_cloud, //(2048, 3)
   at::Tensor out_attributes //(2048, 2048)
) {

       const dim3 threadsPerBlock(32, 32);  // 1024 threads
       const dim3 numBlocks(64,64);

       AT_DISPATCH_FLOATING_TYPES(point_cloud.type(), "sort_in_grid_forward_cuda", ([&] {
       sort_in_grid_forward_kernel<scalar_t><<<numBlocks, threadsPerBlock>>>(
           point_cloud.data<scalar_t>(),
           out_attributes.data<scalar_t>());
           }));
 return out_attributes;
}

I am having error on lines:

out_attributes[i][j]= -1;
out_attributes[i][j]= value;

Is there a easy way access the value of multi dimensional Tensor in kernel than to deal with memory allocations and pointer??
How can we convert ATen tensors to C++ STL vector and pass it to the kernel??

I understood that the error is due to wrong way of accessing the array elements it should be like below:

const int i = (blockIdx.x * blockDim.x) + threadIdx.x;
const int j = (blockIdx.y * blockDim.y) + threadIdx.y;
const int index = j*num_points +i;

out_attributes[index] = value

But if the dimensions of my tensor increases it is very difficult to visualize the memory allocation of tensor to access elements this way.

How to use AT_DISPATCH_FLOATING_TYPES to pass a tensor to the kernel in standard CUDA way ??

When we pass c++ array to cuda kernel as mentioned in the CUDA documentation under section 2.2. Thread Hierarchy
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

We can access the elements like out_attribute[i][j][k] = value much easier.

I don’t think you can pass CPU c++ arrays to CUDA kernels.