New backends and structured TensorIterator kernels

Hello, I am toying with a backend which cannot make use of a TensorIterator for various reasons. I noticed that if a kernel is structured, I can get away with just implementing the structured delegate like mm.out in the case of mm. That doesn’t seem to be the case if the delegate inherits TensorIteratorBase as in the case of add.Tensor - the codegen invokes the TensorIterator form even though my backend provides an add.out.

  1. Is this expected behavior?
  2. If not, is there a codegen means to reduce the boilerplate for external backends? I have looked at what torch_xla does, and it does not seem to use the same structured kernel mechanism (unless I am misinterpreting it).

I did some digging and I was mistaken. The code should be invoking my implementation of add.out, but it is failing in the meta() function because that depends on the TensorIterator. So I guess my question is whether or not there is a codegen means to bypass TensorIterator use without having to reimplement all of the forms of the operator.