When the loaded PyTorch Module returns a “Tensor” it works great. Here is the code:
> // allocate your cuda buffer
> cudaMalloc(cuda_output_ptr,....)
> // make it a proper Torch Tensor
> at::Tensor predTensor = torch::from_blob((void*)cuda_output_ptr, c10::IntArrayRef(myTensorShape), tensor_options);
> // call module foreward and use std::move() for the result variable
> std::move(predTensor) = mModule.forward(std::move(inputs)).toTensor();
But, that does not work for me when the Module outputs a List[Tensor] or TensorList
did anyone had success with that? Here is my code:
> c10::List<at::Tensor> predTensorList;
> // add cuda allocated output buffers...
> for (int i=0; i<xxx; ++i) {
> at::Tensor predTensor = torch::from_blob((void*)output_ptr,
> c10::IntArrayRef(pytorchTensorShapeRefPtr), tensor_options);
> predTensorList.push_back(std::move(predTensor));
> }
> std::move(predTensorList) = mModule.forward(std::move(inputs)).toTensorList();
The result is that the output cuda buffers stay empty or with their previous values.