If I have preallocated some buffers on my GPU using CudaMalloc(). Can I force my neural net to write the output to that buffer? I know I’m able to create a Tensor object using torch::from_blob
using those buffers but then it’s not clear how I can force the neural net to write to those buffers without avoiding a redundant copy.
E.g:
void run(void* input_buffer, std::vector<int> input_shape, void* output_buffer, std::vector<int> output_shape){
auto model = torch::jit::load(model_path)
auto input_tensor = torch::from_blob(input_buffer, input_shape)
auto output_tensor = torch::from_blob(output_buffer, output_shape)
model.forward(input_tensor, output_tensor) // model writes to output buffer
}
I could definitely copy the data over from some temporary tensor but I’m interested in finding a way to avoid that overhead (or potentially an explanation why what I’m trying to do doesn’t make sense).
Thanks!