Triton kernel launch in TorchInductor

I’m trying to understand how TorchInductor is scheduling generated Triton kernels for execution. I can see in precompile function of CachingAutotuner, kernel binaries and launchers are being populated. But I’m not sure where these launchers are actually launched/called and how corresponding cudaLaunchKernels are issued.

Could someone please point me in the right direction?
Thanks in advance!

From what I was able to find, this is handled by Triton during the launcher generation for each kernel in
def generate_launcher(constants, signature, ids).
It compiles the CUDA launcher code into a shared library and commits to the codecache.