Inference speed discrepancies in torch serve

Hello,

Odd question but I’m hitting a wall.

I have two very similar models trained to two slightly different objectives. The architectures are nearly identical. The only difference is one small linear layer. Using jit trace I freeze both models and deploy them using torch serve. One model runs 8x faster than the other. In a notebook, loading both jit models they run at the same speed and the profiles are very similar. See below.

Any ideas?

With respect to the timing on torch serve:

### Model 2
inference_1    | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Preprocessing time: 0.02347397804260254
inference_1    | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.4965062141418457
inference_1    | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.44716620445251465
inference_1    | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.23859477043151855
inference_1    | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.23617815971374512
inference_1    | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Inference time: 0.22520685195922852
inference_1    | 2023-08-03T18:29:28,566 [INFO ] W-9054-XXX-1.0.0-stdout MODEL_LOG - Postprocessing time: 0.00019669532775878906


### Model 1 
inference_1    | 2023-08-03T18:30:25,378 [INFO ] W-9063-yyy-stdout MODEL_LOG - Preprocessing time: 0.02209949493408203
inference_1    | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 4.135310888290405
inference_1    | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 4.075218915939331
inference_1    | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 3.8226993083953857
inference_1    | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 3.8330533504486084
inference_1    | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Inference time: 3.826336145401001
inference_1    | 2023-08-03T18:30:25,379 [INFO ] W-9063-yyy-stdout MODEL_LOG - Postprocessing time: 0.000415802001953125

Below are profiles from when the models are executed in a notebook.
Model 1:

--------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                            Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
--------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                 model_inference         0.99%     214.000us        99.29%      21.511ms      21.511ms             1  
                         forward         6.25%       1.353ms        98.27%      21.291ms      21.291ms             1  
                    aten::linear         2.08%     450.000us        38.81%       8.409ms     158.660us            53  
              aten::_convolution         2.90%     628.000us        25.02%       5.420ms     258.095us            21  
                    aten::matmul         1.86%     402.000us        19.29%       4.180ms     181.739us            23  
        aten::mkldnn_convolution        16.74%       3.627ms        17.25%       3.738ms     196.737us            19  
                       aten::bmm        14.09%       3.053ms        14.09%       3.053ms     138.773us            22  
                     aten::addmm        12.10%       2.621ms        13.68%       2.963ms      92.594us            32  
                      aten::gelu         6.73%       1.459ms         6.73%       1.459ms      91.188us            16  
                       aten::mul         6.04%       1.308ms         6.04%       1.308ms      81.750us            16  
                aten::batch_norm         0.18%      38.000us         5.81%       1.259ms     157.375us             8  
    aten::_batch_norm_impl_index         0.24%      53.000us         5.64%       1.221ms     152.625us             8  
         aten::native_batch_norm         3.95%     856.000us         5.35%       1.158ms     144.750us             8  
                aten::group_norm         0.36%      77.000us         4.06%     880.000us      55.000us            16  
         aten::native_group_norm         3.41%     739.000us         3.71%     803.000us      50.188us            16  
                     aten::copy_         3.48%     754.000us         3.48%     754.000us      13.709us            55  
                       aten::add         2.59%     562.000us         3.03%     657.000us      26.280us            25  
                aten::contiguous         0.12%      26.000us         2.88%     625.000us      48.077us            13  
                     aten::clone         0.33%      71.000us         2.76%     599.000us      46.077us            13  
                         aten::t         1.19%     258.000us         2.12%     460.000us       8.679us            53  
--------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 21.665ms

STAGE:2023-08-03 17:57:09 11773:11773 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2023-08-03 17:57:09 11773:11773 ActivityProfilerController.cpp:300] Completed Stage: Collection

Model 2:

--------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                            Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
--------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                 model_inference         5.76%       1.453ms        99.87%      25.213ms      25.213ms             1  
                         forward         8.52%       2.152ms        94.09%      23.754ms      23.754ms             1  
                    <forward op>         1.65%     416.000us        41.59%      10.500ms     500.000us            21  
                    aten::linear         1.53%     387.000us        31.52%       7.958ms     153.038us            52  
              aten::_convolution         2.16%     545.000us        28.54%       7.205ms     343.095us            21  
                    aten::matmul         1.56%     394.000us        14.87%       3.755ms     163.261us            23  
        aten::mkldnn_convolution        13.77%       3.477ms        14.18%       3.579ms     188.368us            19  
                     aten::addmm        10.43%       2.633ms        12.07%       3.046ms      98.258us            31  
                       aten::bmm        10.70%       2.702ms        10.70%       2.702ms     122.818us            22  
               aten::thnn_conv2d         0.04%       9.000us         8.58%       2.167ms       1.083ms             2  
      aten::_slow_conv2d_forward         8.40%       2.120ms         8.55%       2.158ms       1.079ms             2  
                      aten::gelu         5.75%       1.452ms         5.75%       1.452ms      90.750us            16  
    aten::_batch_norm_impl_index         0.25%      63.000us         5.01%       1.265ms     140.556us             9  
         aten::native_batch_norm         3.52%     888.000us         4.72%       1.191ms     132.333us             9  
                      aten::add_         4.22%       1.065ms         4.22%       1.065ms      30.429us            35  
                aten::group_norm         0.27%      69.000us         3.95%     997.000us      62.312us            16  
                aten::batch_norm         0.13%      33.000us         3.81%     961.000us     192.200us             5  
         aten::native_group_norm         3.45%     871.000us         3.68%     928.000us      58.000us            16  
                     aten::copy_         3.42%     864.000us         3.42%     864.000us      16.000us            54  
                aten::contiguous         0.33%      83.000us         2.78%     702.000us      54.000us            13  
--------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 25.245ms

STAGE:2023-08-03 17:30:03 11773:11773 ActivityProfilerController.cpp:294] Completed Stage: Warm Up
STAGE:2023-08-03 17:30:03 11773:11773 ActivityProfilerController.cpp:300] Completed Stage: Collection

Your CPU performance will be heavily influenced by launcher core pinning Grokking PyTorch Intel CPU performance from first principles — PyTorch Tutorials 2.0.1+cu117 documentation so highly recommend you

What does your config.properties look like? Make sure to add these 2 lines

cpu_launcher_enable=true
cpu_launcher_args=--use_logical_core

Also as you’re deploying both models are you deploying both together in the same torch serve instance?

Could you first collect the perf data as a baseline by running the two models without TorchServe? And then compare them with the above data

That did it. Thanks for the feedback.

The core pinning? Sweet!

For sure.

I’m trying to optimize the performance a bit more, so I switched to jemalloc. It runs but I get the following warnings

<jemalloc>: Invalid conf pair: oversize_threshold:1
<jemalloc>: Invalid conf pair: background_thread:true
<jemalloc>: Invalid conf pair: metadata_thp:auto

Do you think there is a versioning issue between jemallox and the launcher?

Thanks. :pray:

More logs below.

2023-08-16T00:01:48,190 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - 2023-08-16 00:01:48,190 - __main__ - INFO - Use 'jemalloc' memory allocator.
2023-08-16T00:01:48,190 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - 2023-08-16 00:01:48,190 - __main__ - INFO - Use 'auto' => 'intel' OpenMP runtime.
2023-08-16T00:01:48,202 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - 2023-08-16 00:01:48,202 - __main__ - INFO - Use 'auto' => 'taskset' multi-task manager.
2023-08-16T00:01:48,202 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - 2023-08-16 00:01:48,202 - __main__ - INFO - env: Untouched preset environment variables are not displayed.
2023-08-16T00:01:48,202 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - 2023-08-16 00:01:48,202 - __main__ - INFO - env: LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so:/usr/lib/x86_64-linux-gnu/libiomp5.so:
2023-08-16T00:01:48,202 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - 2023-08-16 00:01:48,202 - __main__ - INFO - env: MALLOC_CONF=oversize_threshold:1,background_thread:true,metadata_thp:auto
2023-08-16T00:01:48,203 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - 2023-08-16 00:01:48,202 - __main__ - INFO - env: KMP_BLOCKTIME=1
2023-08-16T00:01:48,203 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - 2023-08-16 00:01:48,203 - __main__ - INFO - env: OMP_NUM_THREADS=8
2023-08-16T00:01:48,203 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - 2023-08-16 00:01:48,203 - __main__ - INFO - cmd: taskset -c 0-7 /home/venv/bin/python3 -u /home/venv/lib/python3.8/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /tmp/.ts.sock.9035
2023-08-16T00:01:48,208 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - <jemalloc>: Invalid conf pair: oversize_threshold:1
2023-08-16T00:01:48,208 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - <jemalloc>: Invalid conf pair: background_thread:true
2023-08-16T00:01:48,208 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - <jemalloc>: Invalid conf pair: metadata_thp:auto
2023-08-16T00:01:48,209 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - <jemalloc>: Invalid conf pair: oversize_threshold:1
2023-08-16T00:01:48,209 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - <jemalloc>: Invalid conf pair: background_thread:true
2023-08-16T00:01:48,209 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - <jemalloc>: Invalid conf pair: metadata_thp:auto
2023-08-16T00:01:48,210 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - <jemalloc>: Invalid conf pair: oversize_threshold:1
2023-08-16T00:01:48,210 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - <jemalloc>: Invalid conf pair: background_thread:true
2023-08-16T00:01:48,210 [WARN ] W-9035-model_1.0.0-stderr MODEL_LOG - <jemalloc>: Invalid conf pair: metadata_thp:auto

Sorry just saw this do you mind posting this on our github issues and tagging min-jean-cho?

EDIT: nvm just made an issue myself here jemalloc error · Issue #2582 · pytorch/serve · GitHub

1 Like