There is indeed ongoing work in the different JIT backends, which use code generation to create (fused) kernels. I’m a bit familiar with the internals of the nvfuser work, but unfortunately cannot link to a proper documentation, as it’s still in an early stage.
In any case, I believe to see more code generation approaches in the future, which would make writing custom operations easier in the framework
So the current plan is JIT all the way? No lower-level script language for kernels specifically (like TCs)?
And if it is going to be JIT-based, are there any plans/work for a static diagnostic that tells you how well certain parts of the source TorchScript have managed to generate fused low-level code?
Something like a function summarize_compilation(script(model, method)) that returns a report on how certain sections fused or didn’t. Or something. You can definitely tell I’ve no idea about this area except that “fused–>probably good”