Hello!

Is there a way to update the computation of a certain sub-tensor portion of an output tensor without re-computing the whole output tensor?

To be specific, I have a series of similar input matrices with very small patches of different values scattered around. I am trying to perform matrix multiplication on these matrices, but to reduce computation and memory usage, I would like to compute using only the changed portion of the input and update the output matrix in-place, after the initial computation.

I can locate the changes, and use slices of the input and weight matrix with the linear op (torch.nn.functional.linear) to compute only the patch that needs to be updated on the output. However, I still need to copy and scatter the output patch to the original output tensor to actually update it. I know that using BLAS libraries I can achieve **in-place computation and update** by setting the variable *m* to the *number of rows of the submatrix*, *ldc* to the *number of rows of the original matrix*, and *C* to the start of the submatrix, during the GEMM function call. (Assuming column-major matrices)

Is there a way to similarly specify the *storage offset*, *stride*, and *data_ptr* of the output tensors for the given PyTorch op (matrix multiplication, convolution, add, etc.)? Or will they always create a new contiguous tensor for output?

Thank you in advance for your help!