RuntimeError: quantized engine QNNPACK is not supported

Hello,

I am trying to deploy my CNN inpainting model to Android mobile app, so I am following this tutorial.
https://pytorch.org/tutorials/recipes/quantization.html

model = PartialConvUNet()
backend = 'qnnpack'

model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)
print_size_of_model(model_static_quantized)

And I run this, I get this error.

Traceback (most recent call last):
  File "C:\PycharmProjects\MyCNN\mymodel.py", line 219, in <module>
    torch.backends.quantized.engine = 'qnnpack'
  File "C:\Anaconda3\envs\deeplearning\lib\site-packages\torch\backends\quantized\__init__.py", line 29, in __set__
    torch._C._set_qengine(_get_qengine_id(val))
RuntimeError: quantized engine QNNPACK is not supported

If I set
backend='fbgemm'
It works, but this is the backend for the x86 server, which will not be compatible to Android environment, correct?

This probably means that the machine you are doing quantization on does not support QNNPACK. Could you share what machine and envoronment you are using, and what PyTorch version?

Hello Vasiliy,

OS: Windows 10 x64
CPU: Intel I9-10885H 2.40GHz
GPU: NVIDIA GeForce 1650Ti
PyTorch Version: 1.10.0.dev20210629

Are you building pytorch for windows on your machine or are you cross compiling for android? I see that you want to deploy on android so there qnnpack is definitely supported. But on windows for OSS I am not sure. I will check and get back to you.

Yes. I wrote the model in PyTorch on Windows machine. I am following this deployment workflow.

Write a model → Quantize → Script/trace → Optimize → Maven(?)

According to this article: A developer-friendly guide to model quantization with PyTorch

I needa ARM CPU…

“Since these libraries are architecture-dependent, static quantization must be performed on a machine with the same architecture as your deployment target. If you are using FBGEMM, you must perform the calibration pass on an x86 CPU (usually not a problem); if you are using QNNPACK, calibration needs to happen on an ARM CPU (this is quite a bit harder).”

hi @dalseeroh ,

“Since these libraries are architecture-dependent, static quantization must be performed on a machine with the same architecture as your deployment target. If you are using FBGEMM, you must perform the calibration pass on an x86 CPU (usually not a problem); if you are using QNNPACK, calibration needs to happen on an ARM CPU (this is quite a bit harder).”

That quote does not seem to be accurate. In general, you can use QNNPACK on x86, this is a widely used functionality on Meta where models are calibrated on linux machines with QNNPACK for inference on arm. I think the issues you are hitting on your machine might be specific to your environment.