I would like to know is there a way/suggestions on how to know the CUDA library call/kernel call is invoked in Pytorch? For example, for general matrix matrix multiplication, an automated way to obtain the matrix input dimension and sparsity when the pytorch high level API call to low level API which further translated into library call. Where can I intercept the input information and where is the call to the exact GEMM routine.

Thank you! I got a follow-up question. After reading the code, I did not find the handling of sparse matrix dense multiplication. For example, I see the calling to cuSparse library using Nvidia profiling tool. So, which file should I look into to find out the details of handling sparse matrix dense matrix multiplication.

Thanks! I read the Aten directory and I do have a follow-up question: How to trace the library invocation (e.g. GEMM) to the Intel MKL on CPU or cublas/cusparse on GPU and record the tensor size of those library call. Reading code base to figure them out seems like an inefficient way. Is there a plug-in/tool to do it automatically and record the tensor size of those library call? I tried the pytorch profiler’s tracing, but it does not give the tensor size and seems does not give the function name.