I am interested in reading about how the CUDA backend for pytorch works but I cannot orient myself in the source code. Specifically, i am interested in how the reduction kernels work for operations like sum with axis parameters. Can someone point me towards do the documentation for this or at least where to look in the source code?
I also wonder if the kernels are JIT compiled? But I could probably answer this question myself if someone has a reference.