Which machine should I do the INT8 quantization for mobile deployment?


I recently wrote a custom image inpainting model, an altered form of UNet. Now I want to demo run this model on Android environment, which will be Samsung Galaxy S10. After a few days of research, I got to know that I need to quantize the model for speedy mobile performance. I chose to follow ‘Post training static quantization’ in the link: Quantization Recipe — PyTorch Tutorials 1.10.0+cu102 documentation. But I just encountered a roadblock with a question that, which machine should I run this code below? I wrote this model in x64 Windows 10. Should I run this in pycharm IDE and save the model and then import the model in Android studio? Can someone give me a little bit of guidance to me?

backend = "qnnpack"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)

Development Environment:
OS: Windows 10 x64
CPU: Intel I9-10885H 2.40GHz
GPU: NVIDIA GeForce 1650Ti
PyTorch Version: 1.10.0.dev20210629

Target Environment:
Android OS: 4.4+
Device: Samsung Galaxy S10

Have you taken a look at this page: Android | PyTorch?

A lot of deployment is also done through torchscript, you can get some more info here: TorchScript for Deployment — PyTorch Tutorials 1.10.0+cu102 documentation

I am trying to do quantization-aware-training before torch scripting to reduce the model size and to keep the accuracy. “My question is:” qnnpack only runs on arm architecture. Where should I run this if I am using Windows machine with Intel CPU?

Of course yes. I am still trying to look for an answer of: Where should I run ‘qnnpack’ backend if I plan to do quantization-aware-training? Jetson Nano? Coral Devboard? Raspberry Pi? Cross compile?

QNNPACK is most performant on arm, but it does run on x86 with the same numerics. You can use any machine which is compiled with qnnpack to calibrate your model.