Slow quantized inference on Cortex-A72

Hi,
I compiled a TFLite Flatbuffer file into a bundle executable and got a major increase in inference time.
What could be the reason for such behavior?

  • Device: Raspberry Pi 4 (Cortex-A72)
  • Inference Time: (min, mean, max)
    ** Float: 1.41ms; 1.80ms; 12.05ms;
    ** Int8 : 3.14ms; 3.84ms; 18.66ms;

How can I fix that?

Hi @MaxS1996, I’d suggest reading this issue – it will give you a better understanding of why using quantization doesn’t necessarily mean that performance gets better: https://github.com/pytorch/glow/issues/4505

1 Like