Inductor vs. Extending Dispatcher?

kirayz · February 24, 2023, 1:02am

First of all, great job on PyTorch2! I have had lots of fun playing with the new features, and thanks a lot for the great work!

I am currently looking at inductor to see if I can connect PT2 to a customized backend + FPGA-style hardware, but have been having a hard time trying to find the right way to do it. My questions are as follows:

CoreATen vs. Prims IR: My understanding is that type promotion and broadcasting is handled at Prims level, a lower level than CoreATen. Hypothetically, if I am generating accelerator code from mid-level CoreATen graph (before lowering is called to get inductor IR), should I handle the type promotion and broadcasting myself? If so, where can I look up these rules?
Organizing code for target-specific code generation: What is the convention to organize codegen-specific functions to maximize reusing implementations given in PT2?
For example, let’s say that we have a dumb hardware that can only run addmm.

If we want to do codegen at fxGraph level, what would be the file and function we need to override to emit this dumb_hardware.addmm?
If we want to do codegen at inductor IR level (loop-based IR), what would be the file and function to override for running dumb_hardware.addmm? Are there places for controlling the loop style, e.g., reduction loop vs. systolic array loops vs. SIMD loops, etc…

Any pointers will be greatly appreciated. Thanks so much!