I’m currently using torch.autograd.profiler.profile() and torch.autograd.profiler.record_function() from PyTorch Profiler for profiling my GPU program. I get confused with the output result by using prof.key_averages().table(). The output is organized as follows:
| Name | Self CPU total % | Self CPU total | CPU total % | CPU total | CPU time avg | CUDA total % | CUDA total | CUDA time avg | Number of Calls |
|---|---|---|---|---|---|---|---|---|---|
I don’t fully understand these, e.g., the relationship and difference between Self CPU total and CPU total. Can anyone provide an introduction to it (or provide some document links)?
Also I find some items like to, index_select has similar results on all Self CPU total, CPU total, CUDA total). I can understand to since it stands for data copy from host to device. But why index_select?