Torch.compile with custom Triton kernel

Triton can and does communicate with Pytorch for PTX/cubin codegen. Furthermore, I see Pytorch implements a lightweight version of Triton’s CachingAutotuner class, even though, I’m a little confused as to who (between Triton and Pytorch) actually handles kernel launching during runtime. I asked this in a different post here.

AFAIK, the autotuning apparatus is used irrespective of whether you’re autotuning multiple configs or not. In the single kernel (i.e., no autotune) case, it will just generate a single kernel and launch the kernel. See def run here. It handles the multiple config case separately.

  if len(self.launchers) != 1:
            if len(self.launchers) == 0:
                self.precompile()
            if len(self.launchers) > 1:
                self.autotune_to_one_config(*args, grid=grid, **kwargs)

However, I’m not certain at this point, if TorchInductor actually reuses Triton JIT runtime or has its own mechanism to launch kernels.