Hello, I am toying with a backend which cannot make use of a TensorIterator for various reasons. I noticed that if a kernel is structured, I can get away with just implementing the structured delegate like mm.out
in the case of mm
. That doesn’t seem to be the case if the delegate inherits TensorIteratorBase
as in the case of add.Tensor
- the codegen invokes the TensorIterator form even though my backend provides an add.out
.
- Is this expected behavior?
- If not, is there a codegen means to reduce the boilerplate for external backends? I have looked at what
torch_xla
does, and it does not seem to use the same structured kernel mechanism (unless I am misinterpreting it).