Is Pytorch able to grab data from GPU calculated from Cuda C directly

Hi all,

I have a data processor that is written by Cuda C, and I used PyCuda as the API to call this. So essentially, the kernel creates the data at GPU

  __global__ void create_data(float* data)
   // process the data using Cuda kernel

the data will be processed within the Cuda kernel function, and finally update this float* data which is sitting, I guess, in the global memory at GPU. Now, the straightforward way is I can copy this data back to CPU host, this is easy by Cuda. Then I call Pytorch to send data in CPU again back to GPU using like“cuda”), this is also easy.

But obviously this is redundant if Pytorch has some API that directly access to the float* data in GPU memory, so the program does not need to do this unnecessary GPU->CPU->GPU data transfer

If you have a CUDA tensor t in PyTorch t.data_ptr<float>() will give you the float* to the tensor’s memory. You would have to deal with sizes, strides etc. yourself.

There also is from_blob to make a tensor pointing to a given memory blob, but you have to guarantee that the pointer will stay valid for the tensor’s lifetime (until the deleter callback that you can pass is called by PyTorch to signal that it’s done using it). TensorOptions allow you to specify that it is a GPU tensor and the dtype etc.

Best regards