Hello, I am trying to compare execution times of two different models and I have the following outputs:
MODEL 1
--------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls
--------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
model_inference 3.78% 10.698ms 100.00% 283.100ms 283.100ms 21.06% 283.099ms 283.099ms 0 b 0 b 1.59 Mb -5.07 Gb 1
convolution 0.22% 636.759us 83.34% 235.940ms 4.369ms 16.89% 227.029ms 4.204ms 0 b 0 b 2.49 Gb 0 b 54
_convolution 2.04% 5.770ms 83.12% 235.304ms 4.357ms 16.87% 226.761ms 4.199ms 0 b 0 b 2.49 Gb 0 b 54
conv2d 0.23% 657.154us 79.73% 225.702ms 4.340ms 16.07% 215.954ms 4.153ms 0 b 0 b 2.49 Gb 0 b 52
cudnn_convolution 77.90% 220.547ms 79.52% 225.134ms 6.085ms 15.74% 211.566ms 5.718ms 0 b 0 b 1.63 Gb -2.74 Mb 37
batch_norm 0.45% 1.263ms 5.16% 14.609ms 280.941us 2.01% 26.963ms 518.524us 0 b 0 b 2.49 Gb -158.00 Kb 52
_batch_norm_impl_index 0.83% 2.362ms 4.71% 13.346ms 256.651us 1.97% 26.514ms 509.885us 0 b 0 b 2.49 Gb 0 b 52
lstm 0.04% 119.700us 4.14% 11.712ms 11.712ms 0.88% 11.866ms 11.866ms 0 b -8 b 2.08 Mb 0 b 1
conv1d 0.01% 25.260us 3.86% 10.920ms 5.460ms 0.84% 11.349ms 5.674ms 0 b 0 b 1012.00 Kb 0 b 2
_cudnn_init_dropout_state 3.39% 9.584ms 3.39% 9.595ms 9.595ms 0.71% 9.595ms 9.595ms 0 b 0 b 1.59 Mb 0 b 1
--------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
Self CPU time total: 283.100ms
CUDA time total: 1.344s
MODEL 2
--------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem Number of Calls
--------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
model_inference 4.72% 9.301ms 100.00% 197.095ms 197.095ms 21.51% 197.095ms 197.095ms 0 b 0 b 1.59 Mb -3.10 Gb 1
convolution 0.28% 557.852us 77.57% 152.878ms 2.831ms 16.30% 149.410ms 2.767ms 0 b 0 b 1.51 Gb 0 b 54
_convolution 1.47% 2.896ms 77.28% 152.320ms 2.821ms 16.28% 149.156ms 2.762ms 0 b 0 b 1.51 Gb 0 b 54
cudnn_convolution 70.81% 139.564ms 72.86% 143.598ms 3.881ms 15.21% 139.411ms 3.768ms 0 b 0 b 1.01 Gb -154.50 Kb 37
conv2d 0.29% 573.546us 71.73% 141.381ms 2.719ms 15.03% 137.764ms 2.649ms 0 b 0 b 1.51 Gb 0 b 52
batch_norm 0.56% 1.110ms 6.39% 12.598ms 242.262us 2.13% 19.510ms 375.183us 0 b 0 b 1.51 Gb -102.00 Kb 52
conv1d 0.02% 30.088us 6.14% 12.100ms 6.050ms 1.30% 11.905ms 5.952ms 0 b 0 b 1012.00 Kb 0 b 2
_batch_norm_impl_index 1.10% 2.177ms 5.83% 11.488ms 220.918us 2.08% 19.046ms 366.269us 0 b 0 b 1.51 Gb 0 b 52
lstm 0.05% 108.296us 5.35% 10.541ms 10.541ms 1.15% 10.540ms 10.540ms 0 b -8 b 2.08 Mb 0 b 1
empty 4.18% 8.245ms 4.18% 8.245ms 19.867us 0.71% 6.509ms 15.685us 8 b 8 b 2.65 Gb 2.65 Gb 415
--------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
Self CPU time total: 197.095ms
CUDA time total: 916.409ms
Can anyone explain understand the huge differences on CUDA total time vs CPU time ?