Why pytorch use triton as backend of inductor?

The behavior of Inductor using Triton is quite puzzling to me. I’d really appreciate any insights on this matter.

Why triton Language

In pursuit of higher performance and more fine-grained control over hardware, most compilers opt to generate lower-level IR rather than a language, unless that language is extremely optimized for the hardware (such as TVM generating CUDA C). In most cases, the code generated by the compiler does not require manual modification at the language level. I would like to understand what factors led PyTorch to choose the Triton language as the lowering target for Inductor.

What’s next
Another thing I’m curious about is: what are the criteria for evaluating and selecting a backend language? Triton demonstrates strong expressiveness and performance in large language models, but in other domains, its limitations and lack of expressiveness are currently quite apparent. In terms of performance and expressiveness, both mojo and cu-tile seem to have the potential to surpass Triton. In the future, under what circumstances might Inductor consider replacing its backend language or compiler?

Thank you for all the responses related to this topic.

@leslie-fang-intel @smth ping

@ptrblck @richard ping