Is that possible to use forward execution with preallocated output tensors?

For example, can we use forward function like below?:

model = torch.load(…)
input = torch.Tensor(…)
output = torch.Tensor(…)
model.forward(input, output=[output])

When inferencing model, I need to save output into cuda ipc shared memory based tensor. But I didn’t found the way executing model with pre allocated shared memory based tensor. so, now I just use memcpy from model’s out to shared memory based tensor, but it seems not efficient.