Hello,
Odd question but I’m hitting a wall.
I have two very similar models trained to two slightly different objectives. The architectures are nearly identical. The only difference is one small linear layer. Using jit trace I freeze both models and deploy them using torch serve. One model runs 8x faster than the other. In a notebook, loading both jit models they run at the same speed and the profiles are very similar. See below.
Any ideas?
With respect to the timing on torch serve:
### Model 2
inference_1 | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Preprocessing time: 0.02347397804260254
inference_1 | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.4965062141418457
inference_1 | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.44716620445251465
inference_1 | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.23859477043151855
inference_1 | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.23617815971374512
inference_1 | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.22520685195922852
inference_1 | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Postprocessing time: 0.00019669532775878906
### Model 1
inference_1 | 2023-08-03T18:30:25,378 [INFO ] W-9063-yyy-stdout MODEL_LOG - Preprocessing time: 0.02209949493408203
inference_1 | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 4.135310888290405
inference_1 | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 4.075218915939331
inference_1 | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 3.8226993083953857
inference_1 | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 3.8330533504486084
inference_1 | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 3.826336145401001
inference_1 | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Postprocessing time: 0.000415802001953125
Below are profiles from when the models are executed in a notebook.
Model 1:
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
model_inference 0.99% 214.000us 99.29% 21.511ms 21.511ms 1
forward 6.25% 1.353ms 98.27% 21.291ms 21.291ms 1
aten::linear 2.08% 450.000us 38.81% 8.409ms 158.660us 53
aten::_convolution 2.90% 628.000us 25.02% 5.420ms 258.095us 21
aten::matmul 1.86% 402.000us 19.29% 4.180ms 181.739us 23
aten::mkldnn_convolution 16.74% 3.627ms 17.25% 3.738ms 196.737us 19
aten::bmm 14.09% 3.053ms 14.09% 3.053ms 138.773us 22
aten::addmm 12.10% 2.621ms 13.68% 2.963ms 92.594us 32
aten::gelu 6.73% 1.459ms 6.73% 1.459ms 91.188us 16
aten::mul 6.04% 1.308ms 6.04% 1.308ms 81.750us 16
aten::batch_norm 0.18% 38.000us 5.81% 1.259ms 157.375us 8
aten::_batch_norm_impl_index 0.24% 53.000us 5.64% 1.221ms 152.625us 8
aten::native_batch_norm 3.95% 856.000us 5.35% 1.158ms 144.750us 8
aten::group_norm 0.36% 77.000us 4.06% 880.000us 55.000us 16
aten::native_group_norm 3.41% 739.000us 3.71% 803.000us 50.188us 16
aten::copy_ 3.48% 754.000us 3.48% 754.000us 13.709us 55
aten::add 2.59% 562.000us 3.03% 657.000us 26.280us 25
aten::contiguous 0.12% 26.000us 2.88% 625.000us 48.077us 13
aten::clone 0.33% 71.000us 2.76% 599.000us 46.077us 13
aten::t 1.19% 258.000us 2.12% 460.000us 8.679us 53
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 21.665ms
STAGE:2023-08-03 17:57:09 11773:11773 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2023-08-03 17:57:09 11773:11773 ActivityProfilerController.cpp:300] Completed Stage: Collection
Model 2:
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
model_inference 5.76% 1.453ms 99.87% 25.213ms 25.213ms 1
forward 8.52% 2.152ms 94.09% 23.754ms 23.754ms 1
<forward op> 1.65% 416.000us 41.59% 10.500ms 500.000us 21
aten::linear 1.53% 387.000us 31.52% 7.958ms 153.038us 52
aten::_convolution 2.16% 545.000us 28.54% 7.205ms 343.095us 21
aten::matmul 1.56% 394.000us 14.87% 3.755ms 163.261us 23
aten::mkldnn_convolution 13.77% 3.477ms 14.18% 3.579ms 188.368us 19
aten::addmm 10.43% 2.633ms 12.07% 3.046ms 98.258us 31
aten::bmm 10.70% 2.702ms 10.70% 2.702ms 122.818us 22
aten::thnn_conv2d 0.04% 9.000us 8.58% 2.167ms 1.083ms 2
aten::_slow_conv2d_forward 8.40% 2.120ms 8.55% 2.158ms 1.079ms 2
aten::gelu 5.75% 1.452ms 5.75% 1.452ms 90.750us 16
aten::_batch_norm_impl_index 0.25% 63.000us 5.01% 1.265ms 140.556us 9
aten::native_batch_norm 3.52% 888.000us 4.72% 1.191ms 132.333us 9
aten::add_ 4.22% 1.065ms 4.22% 1.065ms 30.429us 35
aten::group_norm 0.27% 69.000us 3.95% 997.000us 62.312us 16
aten::batch_norm 0.13% 33.000us 3.81% 961.000us 192.200us 5
aten::native_group_norm 3.45% 871.000us 3.68% 928.000us 58.000us 16
aten::copy_ 3.42% 864.000us 3.42% 864.000us 16.000us 54
aten::contiguous 0.33% 83.000us 2.78% 702.000us 54.000us 13
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 25.245ms
STAGE:2023-08-03 17:30:03 11773:11773 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2023-08-03 17:30:03 11773:11773 ActivityProfilerController.cpp:300] Completed Stage: Collection