I would like to know how to integrate the CUDA accessor example given here
https://pytorch.org/cppdocs/notes/tensor_basics.html#cuda-accessors
in a CMake project. Just copy-pasting the code into a cpp
file does not compile.
I would like to know how to integrate the CUDA accessor example given here
https://pytorch.org/cppdocs/notes/tensor_basics.html#cuda-accessors
in a CMake project. Just copy-pasting the code into a cpp
file does not compile.
I don’t know what exactly is failing as you didn’t give any information about the issues, but maybe the right includes are missing?
Just pasting the code in a cpp file and compiling it with Clang or Gcc does not work because of the global keyword. I assume that I need to place it in a separate cu file and compile it with nvcc into a library. How is it done with the CUDA kernels in the C++ API’s libraries?
You might need to add CUDA
as a language to your CMakeLists.txt
e.g. via:
project(custom_ops LANGUAGES CXX CUDA)
Also make sure the file extension is .cu
for your custom CUDA code.
I still try to reverse engineer how Torch is doing it internally. Assume that I have a C++ application written in the file app.cxx
. Depending on whether the tensors are on the CPU or GPU I need to dispatch between a CPU accessor and a GPU accessor variant, e.g.
auto a = torch::zeros({10});
if (a.is_cuda())
{
auto a_accessor = a.packed_accessor64<float,1>();
packed_accessor_kernel<<<1, 10>>>(a_accessor); // does not compile by g++!!!
}
else
{
auto a_accessor = a.accessor<float,1>();
for (int64_t i=0; i<10; ++i)
a_accessor[i] = i;
}
The corresponding CUDA kernel would read as follows
__global__ void packed_accessor_kernel(
torch::PackedTensorAccessor64<float, 1> a) {
int i = threadIdx.x;
a[i] = i;
}
I could put the CUDA kernel into a .cu
file and put the entire dispatch routine into a library that I link to my application. But this would introduce some overhead as I would have to call the dispatch routine instead of implementing it inline (so far all my code is header-only). Any help on how this is done efficiently in Torch is appreciated.