Hello,
Does Torch.compile() generate kernels at the Python level (quite abstract), at the C++ level (close to the metal), or using “assembly language” the current device uses (really close to the metal)?
Thanks!
Hello,
Does Torch.compile() generate kernels at the Python level (quite abstract), at the C++ level (close to the metal), or using “assembly language” the current device uses (really close to the metal)?
Thanks!
torch.compile generally stays at python level, and usually emits OpenAI triton code for kernels, which then ultimately lowers to assembly (e.g. CUDA PTX for cuda kernels).
Thanks, Matt. That helps!