How can I get access to the raw GPU data of a tensor for pyCUDA? And how do I convert back?

I want to do some computation on tensors which I cannot do with PyTorch without copying memory to the cpu. For this, I want to use scikit-cuda which in turn relies on pyCUDA. pyCUDA has a GPUArray class which I somehow need to instantiate from the memory occupied by a tensor. So far I’ve only found the _cdata field in tensors or Storage classes. This appears to return some kind of address.
What is the exact nature of this address? Is it a contiguous block of memory and could it be used to treat it as a different data type?

Hi,

I think the answer has already been answered in this post.
You can find here an example on how to do it.

1 Like

Unfortunately, the other direction is not addressed in the gist, i.e. how to create a tensor from a gpuarray.

Unfortunately that is not possible at the moment.
The only solution that you can do is to create a Tensor with the proper size, convert it to a gpuarray and make your pyCUDA code change this gpuarray inplace.

Is converting GPUarray of pyCUDA into Pytorch’s tensor still not possible? Then is it at least possible to specify cuContext s.t. multiple threads that belong to the same process use the same context ( it seems like the default behavior is to create a context for every thread )?

I briefly experimented with something like this. I can’t really remember what the limitations were here, since I switched to cupy later. I seem to remember that accessing the converted GPUArray or the Tensor was not problematic, but arithemtic on it failed at runtime.

import pycuda.autoinit
from pycuda.gpuarray import GPUArray
from pycuda.driver import PointerHolderBase

class Holder(PointerHolderBase):

    def __init__(self, tensor):
        super().__init__()
        self.tensor = tensor
        self.gpudata = tensor.data_ptr()

    def get_pointer(self):
        return self.tensor.data_ptr()

    # without an __index__ method, arithmetic calls to the GPUArray backed by this pointer fail
    # not sure why, this needs to return some integer, apparently
    def __index__(self):
        return self.gpudata

# dict to map between torch and numpy dtypes
dtype_map = {
    # signed integers
    torch.int8: np.int8,
    torch.int16: np.int16,
    torch.short: np.int16,
    torch.int32: np.int32,
    torch.int: np.int32,
    torch.int64: np.int64,
    torch.long: np.int64,

    # unsinged inters
    torch.uint8: np.uint8,

    # floating point
    torch.float: np.float32,
    torch.float32: np.float32,
    torch.float16: np.float16,
    torch.half: np.float16,
    torch.float64: np.float64,
    torch.double: np.float64
}


def torch_dtype_to_numpy(dtype):
    '''Convert a torch ``dtype`` to an equivalent numpy ``dtype``, if it is also available in pycuda.
    Parameters
    ----------
    dtype   :   np.dtype
    Returns
    -------
    torch.dtype
    Raises
    ------
    ValueError
        If there is not PyTorch equivalent, or the equivalent would not work with pycuda
    '''

    from pycuda.compyte.dtypes import dtype_to_ctype
    if dtype not in dtype_map:
        raise ValueError(f'{dtype} has no PyTorch equivalent')
    else:
        candidate = dtype_map[dtype]
        # we can raise exception early by checking of the type can be used with pycuda. Otherwise
        # we realize it only later when using the array
        try:
            _ = dtype_to_ctype(candidate)
        except ValueError:
            raise ValueError(f'{dtype} cannot be used in pycuda')
        else:
            return candidate


def numpy_dtype_to_torch(dtype):
    '''Convert numpy ``dtype`` to torch ``dtype``. The first matching one will be returned, if there
    are synonyms.
    Parameters
    ----------
    dtype   :   torch.dtype
    Returns
    -------
    np.dtype
    '''
    for dtype_t, dtype_n in dtype_map.items():
        if dtype_n == dtype_t:
            return dtype_t


def tensor_to_gpuarray(tensor):
    '''Convert a :class:`torch.Tensor` to a :class:`pycuda.gpuarray.GPUArray`. The underlying
    storage will be shared, so that modifications to the array will reflect in the tensor object.
    Parameters
    ----------
    tensor  :   torch.Tensor
    Returns
    -------
    pycuda.gpuarray.GPUArray
    Raises
    ------
    ValueError
        If the ``tensor`` does not live on the gpu
    '''
    if not tensor.is_cuda:
        raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
    else:
        array = GPUArray(tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype),
                         gpudata=Holder(tensor))
        return array


def gpuarray_to_tensor(gpuarray):
    '''Convert a :class:`pycuda.gpuarray.GPUArray` to a :class:`torch.Tensor`. The underlying
    storage will NOT be shared, since a new copy must be allocated.
    Parameters
    ----------
    gpuarray  :   pycuda.gpuarray.GPUArray
    Returns
    -------
    torch.Tensor
    '''
    shape = gpuarray.shape
    dtype = gpuarray.dtype
    out_dtype = numpy_dtype_to_torch(dtype)
    out = torch.zeros(shape, dtype=out_dtype).cuda()
    gpuarray_copy = tensor_to_gpuarray(out)
    byte_size = gpuarray.itemsize * gpuarray.size
    pycuda.driver.memcpy_dtod(gpuarray_copy.gpudata, gpuarray.gpudata, byte_size)
    return out

I don’t know if that helps you.