Hello, I’m currently trying to copy into a pytorch tensor the data from a c++ process. Since the data is also on the GPU and accessible through CUDA, I was thinking of using a simple cudaMemcpy to achieve efficient results.
The problem is that I’d need a CUDA accessible GPU address.
Reading this previous post, it mentions that using storage._share_cuda_() gives you access to the cudaIpcMemhandle_t, which in theory I could use either on the python side (through libraries like pycuda), or on the c++ to get the device pointer.
The issues I’m having is that firstly is that I’m getting a 66 bytearray instead of a 64 byte array, which is cudaIpcMemhandle_t’s actual size. Secondly is that even when I arbitrarly choose the last 64 byte and try to use pycuda’s IPCMemoryHandle function, I get a invalid device context error.
When I try to open the handle in c++ it doesn’t work directly.
I’m quite unfamiliar with pytorch so any help would be appreciated.