Generate Triton kernels for CPU

Hi,

Is it possible to generate Triton kernels without having a GPU device using torch.compile? Can we achieve this by registering a new Dynamo backend and enabling torch-to-triton translation for the CPU?

Context: I want to generate triton kernels out of pytorch model for a non-GPU device, and would like to know where to start? Any docs on how this conversion works for GPU devices?

Any idea on this? If generating triton for cpu is not available in inductor, what would be the best way to go from torch to triton?

Do you mean running the generated Triton kernel in the CPU? You can set TRITON_INTERPRET=1 in your environment variables, which will run the kernels in the CPU for debug mode GitHub - openai/triton: Development repository for the Triton language and compiler

I mean generating triton kernels from pytorch models. Currently, torchinductor’s codegen can do this but it only works for GPU devices. For CPU, torchInductor generates C++/OpenMP code. My question is basically, how to generate triton out of torch models if I don’t have cuda device?

Hi, as far as I know, triton was designed to generate code that ran on GPUs(Nvidia/Intel/AMD GPUs). So It can not generate code if you do not have GPU device.