Where is bmm source code? When using Inductor backend, it is not generating Triton kernels?

TorchDynamo supports many different backends but inductor specifically works by generating Triton kernels and we can inspect them by running TORCH_COMPILE_DEBUG=1 python trig.py.
But when using example can not get matmul kernel. How compute it and where can I get source code?

import torch
def fn(x, y):
    return torch.matmul(x, y).cuda()
new_fn = torch.compile(fn, backend="inductor")
input_tensor = torch.randn(768,768).to(device="cuda:0")
a = new_fn(input_tensor, input_tensor)
print(a)

You can inspect the names of kernels that are executed via e.g., nsys nvprof, though I would suspect for a compute-intensive kernel like BGEMM it would be dispached to cuBLAS: pytorch/CUDABlas.cpp at fb468b6792213e0d8e6221b3bb51e71fcadbed30 · pytorch/pytorch · GitHub

cuBLAS kernels in general are not open-source, though you might some implementations that are comparable in performance in CUTLASS: GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines

1 Like