Understanding PyTorch Profiler

I’m currently using torch.autograd.profiler.profile() and torch.autograd.profiler.record_function() from PyTorch Profiler for profiling my GPU program. I get confused with the output result by using prof.key_averages().table(). The output is organized as follows:

Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls

I don’t fully understand these, e.g., the relationship and difference between Self CPU total and CPU total. Can anyone provide an introduction to it (or provide some document links)?

Also I find some items like to, index_select has similar results on all Self CPU total, CPU total, CUDA total). I can understand to since it stands for data copy from host to device. But why index_select?

1 Like

I think the self cpu total is time of cost on kernel itself, for example, conv2d will call convolution.The Cpu total time cost on conv2d maybe 18.9ms, and time total cost on convolution is 17.9ms, then the self time total of conv2d and convolution is (18.9-17.9=1ms) and 17.9ms.
If some kernle doesn’t call other kernel, then self cpu total is equal to cpu total.
It’s my understanding.
May you a goog day!