I want to do some computation on tensors which I cannot do with PyTorch without copying memory to the cpu. For this, I want to use scikit-cuda
which in turn relies on pyCUDA
. pyCUDA
has a GPUArray
class which I somehow need to instantiate from the memory occupied by a tensor. So far I’ve only found the _cdata
field in tensors or Storage
classes. This appears to return some kind of address.
What is the exact nature of this address? Is it a contiguous block of memory and could it be used to treat it as a different data type?
Hi,
I think the answer has already been answered in this post.
You can find here an example on how to do it.
Unfortunately, the other direction is not addressed in the gist, i.e. how to create a tensor from a gpuarray.
Unfortunately that is not possible at the moment.
The only solution that you can do is to create a Tensor with the proper size, convert it to a gpuarray and make your pyCUDA code change this gpuarray inplace.
Is converting GPUarray of pyCUDA into Pytorch’s tensor still not possible? Then is it at least possible to specify cuContext s.t. multiple threads that belong to the same process use the same context ( it seems like the default behavior is to create a context for every thread )?
I briefly experimented with something like this. I can’t really remember what the limitations were here, since I switched to cupy
later. I seem to remember that accessing the converted GPUArray or the Tensor was not problematic, but arithemtic on it failed at runtime.
import pycuda.autoinit
from pycuda.gpuarray import GPUArray
from pycuda.driver import PointerHolderBase
class Holder(PointerHolderBase):
def __init__(self, tensor):
super().__init__()
self.tensor = tensor
self.gpudata = tensor.data_ptr()
def get_pointer(self):
return self.tensor.data_ptr()
# without an __index__ method, arithmetic calls to the GPUArray backed by this pointer fail
# not sure why, this needs to return some integer, apparently
def __index__(self):
return self.gpudata
# dict to map between torch and numpy dtypes
dtype_map = {
# signed integers
torch.int8: np.int8,
torch.int16: np.int16,
torch.short: np.int16,
torch.int32: np.int32,
torch.int: np.int32,
torch.int64: np.int64,
torch.long: np.int64,
# unsinged inters
torch.uint8: np.uint8,
# floating point
torch.float: np.float32,
torch.float32: np.float32,
torch.float16: np.float16,
torch.half: np.float16,
torch.float64: np.float64,
torch.double: np.float64
}
def torch_dtype_to_numpy(dtype):
'''Convert a torch ``dtype`` to an equivalent numpy ``dtype``, if it is also available in pycuda.
Parameters
----------
dtype : np.dtype
Returns
-------
torch.dtype
Raises
------
ValueError
If there is not PyTorch equivalent, or the equivalent would not work with pycuda
'''
from pycuda.compyte.dtypes import dtype_to_ctype
if dtype not in dtype_map:
raise ValueError(f'{dtype} has no PyTorch equivalent')
else:
candidate = dtype_map[dtype]
# we can raise exception early by checking of the type can be used with pycuda. Otherwise
# we realize it only later when using the array
try:
_ = dtype_to_ctype(candidate)
except ValueError:
raise ValueError(f'{dtype} cannot be used in pycuda')
else:
return candidate
def numpy_dtype_to_torch(dtype):
'''Convert numpy ``dtype`` to torch ``dtype``. The first matching one will be returned, if there
are synonyms.
Parameters
----------
dtype : torch.dtype
Returns
-------
np.dtype
'''
for dtype_t, dtype_n in dtype_map.items():
if dtype_n == dtype_t:
return dtype_t
def tensor_to_gpuarray(tensor):
'''Convert a :class:`torch.Tensor` to a :class:`pycuda.gpuarray.GPUArray`. The underlying
storage will be shared, so that modifications to the array will reflect in the tensor object.
Parameters
----------
tensor : torch.Tensor
Returns
-------
pycuda.gpuarray.GPUArray
Raises
------
ValueError
If the ``tensor`` does not live on the gpu
'''
if not tensor.is_cuda:
raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
else:
array = GPUArray(tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype),
gpudata=Holder(tensor))
return array
def gpuarray_to_tensor(gpuarray):
'''Convert a :class:`pycuda.gpuarray.GPUArray` to a :class:`torch.Tensor`. The underlying
storage will NOT be shared, since a new copy must be allocated.
Parameters
----------
gpuarray : pycuda.gpuarray.GPUArray
Returns
-------
torch.Tensor
'''
shape = gpuarray.shape
dtype = gpuarray.dtype
out_dtype = numpy_dtype_to_torch(dtype)
out = torch.zeros(shape, dtype=out_dtype).cuda()
gpuarray_copy = tensor_to_gpuarray(out)
byte_size = gpuarray.itemsize * gpuarray.size
pycuda.driver.memcpy_dtod(gpuarray_copy.gpudata, gpuarray.gpudata, byte_size)
return out
I don’t know if that helps you.