I am looking the output of torch.utils.bottleneck.
Is there any method to counting the “Calls” for same autograd functions?
Current output seems all “1” for Calls (this output example is for MNIST with epochs=3)
--------------------------------------------------------------------------------
autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
top 15 events sorted by cpu_time_total
Because the autograd profiler uses the CUDA event API,
the CUDA time column reports approximately max(cuda_time, cpu_time).
Please ignore this output if your code does not use CUDA.
----------------------------------- --------------- --------------- --------------- --------------- ---------------
Name CPU time CUDA time Calls CPU total CUDA total
----------------------------------- --------------- --------------- --------------- --------------- ---------------
sub_ 10168.735us 78.125us 1 10168.735us 78.125us
addmm 8117.659us 8125.000us 1 8117.659us 8125.000us
th_addmm 8083.499us 8097.656us 1 8083.499us 8097.656us
ThAddmmBackward 6873.064us 6877.930us 1 6873.064us 6877.930us
mm 6760.498us 6802.734us 1 6760.498us 6802.734us
FeatureDropoutBackward 6435.348us 6300.781us 1 6435.348us 6300.781us
mul 6396.920us 6292.969us 1 6396.920us 6292.969us
div 6165.179us 33.081us 1 6165.179us 33.081us
torch::autograd::AccumulateGrad 6107.851us 6109.375us 1 6107.851us 6109.375us
FeatureDropout 6043.199us 5953.125us 1 6043.199us 5953.125us
add_ 5922.450us 5832.031us 1 5922.450us 5832.031us
_th_get_device 5869.107us 5820.312us 1 5869.107us 5820.312us
tensor 5837.294us 5808.594us 1 5837.294us 5808.594us
add_ 5741.874us 5769.531us 1 5741.874us 5769.531us
th_add_ 5716.754us 5757.812us 1 5716.754us 5757.812us