Create a torch tensor with a device pointer

Is it possible to create a torch cuda float tensor from gpu memory?
The following code creates a numpy array and copy it to gpu memory. Now whether it is possible to create a torch tensor from gpu memory itself without copying it back to host as a numpy array. I know the example code doesn’t convey the actual purpose, in my application I do some computations on the inputs on gpu and then I would like to use one of the torch functionals on the output of the computations. Currently I copy initial results back to host and then create a torch tensor and move it back to the gpu. If I can directly create a cuda tensor directly, then I can avoid two copy operations.

from cuda import cudart, cuda
import numpy as np
import torch

# create a numpy array of shape (6, 3, 640, 640) with random values between 0 and 1 (float32) with C order
x = np.random.rand(6, 3, 640, 640).astype(np.float32, order="C")

# allocate memory on the GPU
x_gpu = cudart.cudaMalloc(x.nbytes)

# copy the numpy array to the GPU
cudart.cudaMemcpy(x_gpu, x, x.nbytes, cudart.cudaMemcpyKind.cudaMemcpyHostToDevice)

# create a cuda float tensor with same size as x
x_torch = torch.cuda.FloatTensor(x.size, device=torch.device('cuda'))

Will it be possible to copy the data from x_gpu to x_torch?

You could try to use torch.utils.dlpack.from_dlpack(ext_tensor) or torch.frombuffer but note that the object needs to implement the buffer protocol.

1 Like