Why does backward and forward function of an operator allocate new tensors?

Hi, looking at the example extension-cpp.

I am wondering why you allocate memory for the output tensors each time you call forward or backward.
Sure, for the forward case you could provide an output tensor.
But for the backward case this is not possible without bad hacks.

Does PyTorch internally take care that old output tensors get reused, or is this not a big issue regarding performance.
My forward benchmark is 1.5% faster without reallocating a new tensor.

PyTorch uses a caching allocator and reused already allocated memory.
I think the code tries to focus on readability of the code and as you said, these small perf. gains are sometimes not possible without bad hacks.