I was trying to understand the exact logic for fusing a set of operators before codegen.
Consider this sub-graph:
class FuseSiLU(nn.Module):
def __init__(self):
super(SiLU, self).__init__()
def forward(self, x):
x = x * F.silu(x)
return x
I am running this model using inductor backend in torch.compile. On further analysis with profiler, with this flag enabled:
from torch._inductor import config
config.cpp.enable_kernel_profile = True
I can see this operator in profile log.
graph_0_cpp_fused_mul_silu_
I want to understand where is the decison being taken to fuse a bunch of op’s before codegen.
So far I’ve looked at some of the earlier answers for torch.inductor pattern-matcher but could’t pin the logic for this one. Any help would be much appreciated.