Writing output of neural net to specific buffer

If I have preallocated some buffers on my GPU using CudaMalloc(). Can I force my neural net to write the output to that buffer? I know I’m able to create a Tensor object using torch::from_blob using those buffers but then it’s not clear how I can force the neural net to write to those buffers without avoiding a redundant copy.

E.g:

void run(void* input_buffer, std::vector<int> input_shape, void* output_buffer, std::vector<int> output_shape){
    auto model = torch::jit::load(model_path)
    auto input_tensor = torch::from_blob(input_buffer, input_shape)
    auto output_tensor = torch::from_blob(output_buffer, output_shape)
    model.forward(input_tensor, output_tensor) // model writes to output buffer
}

I could definitely copy the data over from some temporary tensor but I’m interested in finding a way to avoid that overhead (or potentially an explanation why what I’m trying to do doesn’t make sense).

Thanks!

Some PyTorch operators provide the out argument, which can be specified to write the result directly to this tensor. Depending on your last operation you might be able to use this approach.

Thanks for your response!

I think I’m looking for something a bit more generic, but maybe it really does just depend on what the last operator I’m using is doing.