USing Quantization tutorial,but the result different

here is web:(beta) Static Quantization with Eager Mode in PyTorch — PyTorch Tutorials 2.6.0+cu124 documentation
In the final speed - testing section, the tutorial mentions:
“Running this locally on a MacBook Pro yielded 61 ms for the regular model, and just 20 ms for the quantized model, illustrating the typical 2 - 4x speedup we see for quantized models compared to floating - point ones.”
However, when I ran it, I got 15 ms for the regular model and 30 ms for the quantized model. The quantized model runs slower.
Why? I’m asking for help.

@BambooKui theoretically this is possible in the following scenarios

  • You have almost exhausted your machine memory and when you ran the prediction on the quantized model, the memory bottleneck kicked in

In order to test this hypothesis

  • Save both the models in separate files
  • Terminate the python session
  • Open another session, load just the quantized model and check the benchmark timings

Thank you for your reply. I ran the inference part in a separate process. The quantized model was twice as fast, which was faster than the non - quantized model, but the improvement wasn’t as significant as described in the tutorials.