I want to measure the execution time of each function on CUDA
I used torch profiler with activities=[torch.profiler…], but I can only see the total cuda time, self cuda time and time avg. What I want to know is the time of the fisrt cudnn_convolution, second cudnn_convolution…, and the last one. Is there no way to know the execution time of each function separately?
do I need to measure the kernel execution time on tensorboard one by one?
You could use a visual profiler, such as Nsight Systems, which shows the kernel runtimes etc. in the timeline directly, alternatively you could also check the summary of the different kernels via
nsys nvprof python script.py, or you could also manually measure the runtimes via a timer and synchronizations.
Thank you for your reply.
I already used Nsight systems, Nsight compute. but pytorch profiler with tensorboard also shows kernel runtime(wall duration), right? and each “cudnn_convolution” launches multiple kernel so I need integrated kernel runtime, not only one. I just measured each runtime with tensorboard(end time of last kernel - start time of fisrt kernel).
and I think you mean that using time.time() and torch.cuda.synchronize() but I thought it is better to use tensorboard so i used it. Thank you.