Where do I find the fundamental CUDA kernels underlying PyTorch?

Pytorch has very optimized matrix operations when it comes to the standart operations
such as torch.matmul, torch.sparse.mm, …

But where do I find the most fundamental CUDA kernels for these algorithms when I want to participate in the development?

I have been looking in the PyTorch packages as well as in the ATen library but was not able to find them in e.g. the form

// CUDA Kernel function to add the elements of two arrays on the GPU
void add(int n, float *x, float *y)
  for (int i = 0; i < n; i++)
      y[i] = x[i] + y[i];

(Example taken from An Even Easier Introduction to CUDA | NVIDIA Technical Blog)

You can find kernels in aten/src/ATen/native, such as e.g. the fused dropout kernel. Matmuls and other standard operations are used from cuBLAS(Lt) and other math libs.

1 Like