Inductor vs. Extending Dispatcher?

First of all, great job on PyTorch2! I have had lots of fun playing with the new features, and thanks a lot for the great work!

I am currently looking at inductor to see if I can connect PT2 to a customized backend + FPGA-style hardware, but have been having a hard time trying to find the right way to do it. My questions are as follows:

  1. CoreATen vs. Prims IR: My understanding is that type promotion and broadcasting is handled at Prims level, a lower level than CoreATen. Hypothetically, if I am generating accelerator code from mid-level CoreATen graph (before lowering is called to get inductor IR), should I handle the type promotion and broadcasting myself? If so, where can I look up these rules?

  2. Organizing code for target-specific code generation: What is the convention to organize codegen-specific functions to maximize reusing implementations given in PT2?
    For example, let’s say that we have a dumb hardware that can only run addmm.

  • If we want to do codegen at fxGraph level, what would be the file and function we need to override to emit this dumb_hardware.addmm?
  • If we want to do codegen at inductor IR level (loop-based IR), what would be the file and function to override for running dumb_hardware.addmm? Are there places for controlling the loop style, e.g., reduction loop vs. systolic array loops vs. SIMD loops, etc…

Any pointers will be greatly appreciated. Thanks so much!

1 Like