Prologue fusion

Hi everyone, I’m new in PyTorch.

I have questions about fusion in Inductor.

I think Inductor currently supports epilogue fusion, but does not support prologue fusion.
For example, Inductor fuses two nodes: mm followed by relu (that is, epilogue), but when the order is relu followed by mm (that is, prologue), no fusion happens. In Inductor IR, it looks that mm always receives StarDep, which does not match MemoryDep that is the output of relu.

If I want to enable prologue fusion, how can I extend inductor code in the most reasonable way?

In, can_fuse_vertical has conditions to partly accept unmet_dependencies. I’m wondering if the conditions can be extended so that StarDep can compare with MemoryDep somehow. For example, can StarDep be cast to MemoryDep under some conditions?

Also, I’m wondering if TemplateBuffer can also handle MemoryDep.

Any comments would be helpful. Thank you.