When does the inductor code run?

Hello, I am new to torch.compile and want to study the inductor.
I wonder when does the inductor’s optimization code run?

for example, there are main.py code here.

def f(x):
    return torch.sin(x)**2 + torch.cos(x)**2

# first spot of main.py
compiled_f = torch.compile(f, backend='inductor', options={'trace.enabled':True, 'trace.graph_diagram':True})

# second spot of main.py
output_from_compiled_f = compiled_f(some_input)

Thanks to the torch.compile, my sin^2+cos^2 function’s nodes are merged like,

I wonder when does this node-merge(inductor’s optimization logic) happen?

In this situation, I insert some print() in source code of dynamo and inductor for debug purposes.
for example 1,

@ torch/_inductor/compile_fx.py
def compile_fx(
    model_: torch.fx.GraphModule,
    example_inputs_: List[torch.Tensor],
    inner_compile: Callable[..., Any] = compile_fx_inner,
    config_patches: Optional[Dict[str, Any]] = None,
    decompositions: Optional[Dict[OpOverload, Callable[..., Any]]] = None,
):
    """Main entrypoint to a compile given FX graph"""
    print("_____________________________ compile_fx.py ")

for example 2,

@torch/_inductor/lowering.py 
def make_foreach_pointwise(pw_fn, allow_alpha=False):
    print("_____________________________ lowering.py ")

In this situation,

  1. example 2’s debug print happen when first spot of main.py.
  2. example 1’s debug print happen when second spot of main.py.

I am confused that why example 1’s debug print happen when second spot of main.py.
Because i think all the inductor’s optimization could happen when first spot of main.py. (some sort of compile-time of DNN)
So, example 1’s debug print should have happen when the first spot of main.py (i believe this way, but this is not the true).

The location of “example 1’s debug print” is compile_fx which is the ’ “”“Main entrypoint to a compile given FX graph”“” '. This means here is the entrance of inductor logic.

But why inductor (backend compiler) logic is start at the second spot of main.py ? (some sort of run-time of DNN)

My question is summarized as follows.
Q1. When does the inductor’s optimization logic(i.e., node merge for less memory access etc) activated ?
at the first spot of main.py or at the second spot of main.py?

Q2. What is the sequence of the inductor’s execution?
I traced the source code of torch. And i got following execution sequence information.

(1) torch/_dynamo/eval_frame.py : def optimize →
(2) torch/_dynamo/eval_frame.py : def _optimize_catch_errors →
(3) torch/_dynamo/eval_frame.py : class OptimizeContext(_TorchDynamoContext): →
(4) torch/_dynamo/eval_frame.py : class _TorchDynamoContext: 's def call → ???

but i am lost in here. where should i start again ? (for example, torch/init’s def compile)
where can i find the caller of inductor optimization code?