My preprocessing happens on CUDA memory (but not with libtorch). Currently, I transfer data to CPU in an OpenCV matrix and create a libtortch tensor from it, and finally, I transfer data back to GPU which is really inefficient. I am wondering whether it is possible to initialize libtorch tensor from CUDA memory. For example create an opencv GpuMat and glob from that.
Just be careful with the CUDA device. If you specify a device ID explicitly it needs to be the same as the one the data is on. If you want it on a different device on it needs to go back to CPU first.