Where is bmm source code? When using Inductor backend, it is not generating Triton kernels?

tianxiangsong · May 26, 2023, 2:47am

TorchDynamo supports many different backends but inductor specifically works by generating Triton kernels and we can inspect them by running TORCH_COMPILE_DEBUG=1 python trig.py.
But when using example can not get matmul kernel. How compute it and where can I get source code?

import torch
def fn(x, y):
    return torch.matmul(x, y).cuda()
new_fn = torch.compile(fn, backend="inductor")
input_tensor = torch.randn(768,768).to(device="cuda:0")
a = new_fn(input_tensor, input_tensor)
print(a)

eqy · May 26, 2023, 3:55am

You can inspect the names of kernels that are executed via e.g., nsys nvprof, though I would suspect for a compute-intensive kernel like BGEMM it would be dispached to cuBLAS: pytorch/CUDABlas.cpp at fb468b6792213e0d8e6221b3bb51e71fcadbed30 · pytorch/pytorch · GitHub

cuBLAS kernels in general are not open-source, though you might some implementations that are comparable in performance in CUTLASS: GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines