Is it possible to generate Triton kernels without having a GPU device using torch.compile? Can we achieve this by registering a new Dynamo backend and enabling torch-to-triton translation for the CPU?
Context: I want to generate triton kernels out of pytorch model for a non-GPU device, and would like to know where to start? Any docs on how this conversion works for GPU devices?
I mean generating triton kernels from pytorch models. Currently, torchinductor’s codegen can do this but it only works for GPU devices. For CPU, torchInductor generates C++/OpenMP code. My question is basically, how to generate triton out of torch models if I don’t have cuda device?
Hi, as far as I know, triton was designed to generate code that ran on GPUs(Nvidia/Intel/AMD GPUs). So It can not generate code if you do not have GPU device.