Fusing operators in torch.compile for Codegen

adityat · August 12, 2024, 9:18pm

I was trying to understand the exact logic for fusing a set of operators before codegen.

Consider this sub-graph:

class FuseSiLU(nn.Module):
    def __init__(self):
        super(SiLU, self).__init__()
        

    def forward(self, x):
        x = x * F.silu(x)
        return x

I am running this model using inductor backend in torch.compile. On further analysis with profiler, with this flag enabled:

from torch._inductor import config
config.cpp.enable_kernel_profile = True

I can see this operator in profile log.

graph_0_cpp_fused_mul_silu_

I want to understand where is the decison being taken to fuse a bunch of op’s before codegen.
So far I’ve looked at some of the earlier answers for torch.inductor pattern-matcher but could’t pin the logic for this one. Any help would be much appreciated.

adityat · August 12, 2024, 9:20pm

I’ve followed some threads , but it did’t clarify this: