Is there a way to find out how many times of cuda kernels called/launched?

For example, there are five kinds of cuda kernel in a model (conv, add, etc), and when It runs in PyTorch, these kernels will be executed/launched 40 times. So hou could I get the 40 in PyTorch? Using PyTorchProfiler?

Yes, you could use the built-in PyTorch profiler or e.g. Nsight Systems to see summaries of kernel launches as well as the actual timelines.

The calls of cudaLaunchKernel in torch profiler means the launched times of kernel?