Slow quantized inference on Cortex-A72

MaxS1996 · July 16, 2020, 2:10pm

Hi,
I compiled a TFLite Flatbuffer file into a bundle executable and got a major increase in inference time.
What could be the reason for such behavior?

Device: Raspberry Pi 4 (Cortex-A72)
Inference Time: (min, mean, max)
** Float: 1.41ms; 1.80ms; 12.05ms;
** Int8 : 3.14ms; 3.84ms; 18.66ms;

How can I fix that?

jfix · July 16, 2020, 9:15pm

Hi @MaxS1996, I’d suggest reading this issue – it will give you a better understanding of why using quantization doesn’t necessarily mean that performance gets better: https://github.com/pytorch/glow/issues/4505