I’m currently using torch.autograd.profiler.profile()
and torch.autograd.profiler.record_function()
from PyTorch Profiler for profiling my GPU program. I get confused with the output result by using prof.key_averages().table()
. The output is organized as follows:
Name | Self CPU total % | Self CPU total | CPU total % | CPU total | CPU time avg | CUDA total % | CUDA total | CUDA time avg | Number of Calls |
---|---|---|---|---|---|---|---|---|---|
I don’t fully understand these, e.g., the relationship and difference between Self CPU total
and CPU total
. Can anyone provide an introduction to it (or provide some document links)?
Also I find some items like to
, index_select
has similar results on all Self CPU total
, CPU total
, CUDA total
). I can understand to
since it stands for data copy from host to device. But why index_select
?