How to Set QNNPACK as Quantization Backend in C++ for PyTorch on Raspberry Pi 4?

edoabrgel · March 31, 2024, 12:17pm

I’m transitioning from Python to C++ for deploying a quantized model on an rpi4 device and aiming to achieve maximum inference speed. In Python, setting torch.backends.quantized.engine = 'qnnpack' significantly improves my model’s FPS performance, it boosts from ~4 FPS to ~45 FPS.

How can I achieve a similar backend switch to QNNPACK in C++?
Is there any C++ equivalent to:
`torch.backends.quantized.engine = ‘qnnpack’?

Thank you in advance,
Ido

edoabrgel · April 7, 2024, 11:41am

Answer:

#include <ATen/ATen.h>
#include <torch/torch.h>

int main() {
// Set the quantization engine to QNNPACK
at::globalContext().setQEngine(at::QEngine::QNNPACK);
return 0;
}