Cuda tensor to cuda surface

I’m writing a plugin for a graphics application where I am able to copy data directly from a Cuda kernel to a Cuda surface abstraction provided by the host program. Is there a straightforward method to directly copy data from a torch::Tensor object on the GPU to a CUDA surface abstraction object?

Any suggestions or insights would be greatly appreciated


You can call .cpu() to get a copy of the Tensor in host memory. Calling .data_ptr() on that host memory tensor will give you a pointer to the flattened data in host memory.
The numpy conversion code in C++ provides an example of how this is done:


I would like to avoid the copy to CPU memory though and keep the data on the GPU (for performance). Basically I’m evaluating a relatively lightweight model that is outputting many times per second so the copy to the CPU and back again is a substantial hit. I’m looking for a GPU to (same) GPU solution… I’ve succeeding with this approach with Python but I’m trying to shave off a few more milliseconds of processing/copying time.

You should be able to access a GPU data_ptr() in your own CUDA kernels in a similar way.

Yep that’s exactly what ended up doing. Thanks!