Hi there!
I would like to convert a cupy function to torch: the purpose is to allocate pinned memory once, fill it with data (random size lower than allocated) and transfer it to the GPU.
i tried with untyped_storage but I failed creating a tensor pointing to the data_ptr of a pinned tensor and view is not possible because of shape incompatibility.
constraints:
no more memory allocations than what is done with cupy
no use of deprecated functions
here is the cupy code
max_nbytes = ...
h_pinned_mem = cp.cuda.alloc_pinned_memory(max_nbytes) # memory allocation 1 at the initialization (cpu)
# loop, threaded etc, whatever...
variable_buffer_nbytes = ... # from 1 to max_nbytes
...
random_buffer = np.random.bytes(variable_buffer_nbytes) # memory allocation 2 (cpu)
h_pinned_array: np.ndarray = np.frombuffer(h_pinned_mem, dtype=np.uint8, count=variable_buffer_nbytes)
np.copyto(h_pinned_array, random_buffer)
d_cp_tensor = cp.empty((variable_buffer_nbytes,), cp.uint8) # memory allocation 3 (gpu), in my sw it's done at the initialization and never allocated after that
d_cp_tensor.set(h_pinned_array)
d_torch_tensor: torch.Tensor = from_dlpack(d_cp_tensor.toDlpack())
and here is some code in torch
h_pinned_tensor = torch.empty(
max_nbytes, dtype=torch.uint8, device='cpu', requires_grad=False, pin_memory=True
)
# fill h_pinned_tensor with random_buffer, size=variable_buffer_nbytes
untyped_storage: UntypedStorage = bytes_tensor.untyped_storage()
partial_untyped_storage = untyped_storage[0:variable_buffer_nbytes]
# Create a tensor pointing to partial_untyped_storage
# complete this
h_partial_pinned_tensor = .... # <- must not allocate memory, just point to untyped_storage and size must be variable_buffer_nbytes
#
d_torch_tensor: torch.Tensor = h_partial_pinned_tensor.to("cuda")
=> Is it possible or should I have to stick with CuPy?
ps: It’s not needed to ask Claude, chatgpt, they’re wrong.
i don’t need any advice about memory overflow check, cuda streams, cuda synchronisation etc.