I’m writing a plugin for a graphics application where I am able to copy data directly from a Cuda kernel to a Cuda surface abstraction provided by the host program. Is there a straightforward method to directly copy data from a
torch::Tensor object on the GPU to a CUDA surface abstraction object?
Any suggestions or insights would be greatly appreciated
You can call
.cpu() to get a copy of the
Tensor in host memory. Calling
.data_ptr() on that host memory tensor will give you a pointer to the flattened data in host memory.
The numpy conversion code in C++ provides an example of how this is done: https://github.com/pytorch/pytorch/blob/fbe2a7e50a940ba7a12b003241a2699f7a731afb/torch/csrc/utils/tensor_numpy.cpp#L164
I would like to avoid the copy to CPU memory though and keep the data on the GPU (for performance). Basically I’m evaluating a relatively lightweight model that is outputting many times per second so the copy to the CPU and back again is a substantial hit. I’m looking for a GPU to (same) GPU solution… I’ve succeeding with this approach with Python but I’m trying to shave off a few more milliseconds of processing/copying time.
You should be able to access a GPU
data_ptr() in your own CUDA kernels in a similar way.
Yep that’s exactly what ended up doing. Thanks!