Using CUDA IPC memory handles in pytorch

It will be a bit tricky to do correctly because small PyTorch storages are packed into the same CUDA allocation block. You will have to rely on implementation details of PyTorch that may change in the future:

x = torch.randn(100, device='cuda')
storage = x.storage()
device, handle, size, offset, view_size = storage._share_cuda_()

device is the index of the GPU (i.e. 0 for the first GPU)
handle is the cudaIpcMemHandle_t as a Python byte string
size is the size of the allocation (not the Storage!, in elements, not bytes!)
offset is the offset in bytes of the storage data pointer from the CUDA allocation
view_size is the size of the storage (in elements, not bytes!)

1 Like