Unexpected CUDA IPC handle behaviour

Hello, I’m currently trying to copy into a pytorch tensor the data from a c++ process. Since the data is also on the GPU and accessible through CUDA, I was thinking of using a simple cudaMemcpy to achieve efficient results.

The problem is that I’d need a CUDA accessible GPU address.
Reading this previous post, it mentions that using storage._share_cuda_() gives you access to the cudaIpcMemhandle_t, which in theory I could use either on the python side (through libraries like pycuda), or on the c++ to get the device pointer.

The issues I’m having is that firstly is that I’m getting a 66 bytearray instead of a 64 byte array, which is cudaIpcMemhandle_t’s actual size. Secondly is that even when I arbitrarly choose the last 64 byte and try to use pycuda’s IPCMemoryHandle function, I get a invalid device context error.
When I try to open the handle in c++ it doesn’t work directly.

I’m quite unfamiliar with pytorch so any help would be appreciated.

Managed to get the issue fixed. Basically I screwed up the c++ code.
I also got it to work by simply creating an ipcmemhandle from the device pointer of the tensor as well (through debugging I got that)