What is the plan for TensorExpr?

I have noticed some exciting stuff going on in the torch/csrc/jit/tensorexpr directory in the git repo.

It seems to define an IR+codegen. I’m very interested in hearing what the end goal is with this. Obviously, what is there seems immediately useful for allowing optimization over execution graphs (e.g. fusing kernels). I haven’t seen anything that generates the IR. Is the idea to define kernels directly in the IR, so they can more easily be optimized? Or are there plans for a front end that generates the IR?

I ask because many of the pieces seem to be in place for a numba-like system where custom kernels can be defined in a subset of python, and fast code can be generated for GPU and CPU. Are there plan for anything like this, or is that just my own fantasy? (I see that there’s something new going on in the torch.fx namespace, but I don’t know what it’s for.) The missing piece seems to be something that can take a python AST (or some other high level language) and generate the IR.

There’s very little innovation in the basic weight layers of neural networks IMO, there’s convolutions and linear layers, and not much else. I suspect a big reason for this is that it’s hard to produce a performant implementation of an idea. I was a big fan of the Tensor Comprehensions project a couple of years back, but that died out.

I would be very happy to hear a bit more about what the roadmap is for this feature.