So to my understanding, pytorch has the capability of doing direct GPU inference. For example, if the inputs are already in the GPU I can tell the inference network the input location in GPU memory and also specify where to place the output in GPU memory. The Torch::From_Blob method would be used for this.
My question now is how does the situation differ if the memory is not linear but instead a volume as a Cuda Array. This type of structure doesn’t really have a pointer in the same sense and under normal circumstances one would have to use surf3Dwrite() and surf3Dread() in Cuda kernels. If the input is in this format, what is the recommended way to run inference on it?