Some ops are implemented as a composition of other ATen functions like matmul, it consists of expand , view, and bmm in the at::native::matmul. If I implement the kernel of matmul and register it in the following way:
torch::RegisterOperators().op(…“aten::matmul…”).impl_unboxedOnlyKernel<…>…
I wonder how to skip the stitching under at::native::matmul to reach my custom kernel directly without modification of the at::native::matmul source code ?