I’m writing a plugin for a graphics application where I am able to copy data directly from a Cuda kernel to a Cuda surface abstraction provided by the host program. Is there a straightforward method to directly copy data from a torch::Tensor object on the GPU to a CUDA surface abstraction object?
Any suggestions or insights would be greatly appreciated
I would like to avoid the copy to CPU memory though and keep the data on the GPU (for performance). Basically I’m evaluating a relatively lightweight model that is outputting many times per second so the copy to the CPU and back again is a substantial hit. I’m looking for a GPU to (same) GPU solution… I’ve succeeding with this approach with Python but I’m trying to shave off a few more milliseconds of processing/copying time.