Generate Triton kernels for CPU

fhossein-quic · April 15, 2024, 9:33pm

Hi,

Is it possible to generate Triton kernels without having a GPU device using torch.compile? Can we achieve this by registering a new Dynamo backend and enabling torch-to-triton translation for the CPU?

Context: I want to generate triton kernels out of pytorch model for a non-GPU device, and would like to know where to start? Any docs on how this conversion works for GPU devices?

fhossein-quic · April 16, 2024, 6:43pm

Any idea on this? If generating triton for cpu is not available in inductor, what would be the best way to go from torch to triton?

jleeleee · April 18, 2024, 3:30pm

Do you mean running the generated Triton kernel in the CPU? You can set TRITON_INTERPRET=1 in your environment variables, which will run the kernels in the CPU for debug mode GitHub - openai/triton: Development repository for the Triton language and compiler

fhossein-quic · April 18, 2024, 4:51pm

I mean generating triton kernels from pytorch models. Currently, torchinductor’s codegen can do this but it only works for GPU devices. For CPU, torchInductor generates C++/OpenMP code. My question is basically, how to generate triton out of torch models if I don’t have cuda device?

111357 · April 26, 2024, 11:26am

Hi, as far as I know, triton was designed to generate code that ran on GPUs(Nvidia/Intel/AMD GPUs). So It can not generate code if you do not have GPU device.

trusira · February 11, 2025, 10:33pm

For what it’s worth, there’s a Triton CPU backend available now. GitHub - triton-lang/triton-cpu: An experimental CPU backend for Triton
I wonder if you could seamlessly replace Pytorch’s default Triton with this.