I’m transitioning from Python to C++ for deploying a quantized model on an rpi4 device and aiming to achieve maximum inference speed. In Python, setting torch.backends.quantized.engine = 'qnnpack'
significantly improves my model’s FPS performance, it boosts from ~4 FPS to ~45 FPS.
How can I achieve a similar backend switch to QNNPACK in C++?
Is there any C++ equivalent to:
`torch.backends.quantized.engine = ‘qnnpack’?
Thank you in advance,
Ido