Prologue fusion

MichihiroHorie · January 23, 2024, 8:41am

Hi everyone, I’m new in PyTorch.

I have questions about fusion in Inductor.

I think Inductor currently supports epilogue fusion, but does not support prologue fusion.
For example, Inductor fuses two nodes: mm followed by relu (that is, epilogue), but when the order is relu followed by mm (that is, prologue), no fusion happens. In Inductor IR, it looks that mm always receives StarDep, which does not match MemoryDep that is the output of relu.

If I want to enable prologue fusion, how can I extend inductor code in the most reasonable way?

In scheduler.py, can_fuse_vertical has conditions to partly accept unmet_dependencies. I’m wondering if the conditions can be extended so that StarDep can compare with MemoryDep somehow. For example, can StarDep be cast to MemoryDep under some conditions?

github.com

pytorch/pytorch/blob/main/torch/_inductor/scheduler.py#L2010-L2018


      
          if (
              rd.name == cd.name
              and type(rd) == type(cd)
              and not free_symbol_has(rd.index, "tmp")
              and not free_symbol_has(cd.index, "tmp")
              and rd.index == cd.index
              and len(rd.size) >= len(cd.size)
              and rd.size[: len(cd.size)] == cd.size
          ):

Also, I’m wondering if TemplateBuffer can also handle MemoryDep.

github.com

pytorch/pytorch/blob/main/torch/_inductor/ir.py#L3327


      
              name = self.get_name()
              indexer = self.layout.make_indexer()
          
              def dummy(index, rindex):
                  assert len(rindex) == 0
                  return ops.store(name, indexer(index), "fake")
          
              deps = dependencies.extract_read_writes(
                  dummy, self.get_size(), (), normalize=True
              )
              deps.reads = {dependencies.StarDep(x.get_name()) for x in self.inputs}
              return deps
          
          def get_reduction_size(self):
              return 1
          
          def get_reduction_type(self):
              return None
          
          def is_no_op(self):
              return False

Any comments would be helpful. Thank you.