Accelerate model on Hexagon (Snapdragon accelerator)

This is a slightly more general question, but I figured this forum is likely where I’d find the developers who could give me a hand.

I’m looking to deploy a slightly modified int8 quantized MobileNetV3 model to Android devices. My goal is to run accelerated on the hexagon accelerators that are present on Snapdragon chips. I’ve made initial tests with pytorch-mobile-lite and NNAPI, which works well for pixel phones (Tensor G2 chip), but doesn’t run accelerated on Samsung Galaxy S21(Snapdragon 888).

I’ve dug around in various tflite, pytorch-lite and onnx tutorials, as well as the documentation for NNAPI and Qualcomm Neural Processing SDK. I’ve failed to find a way to deploy which will run accelerated on at least the last few generations of Snapdragon devices. It seems that each framework only supports a subset of chips, either 855 and earlier(tf-lite delegate for instance), or 888 and later. What are current best practices for running models accelerated on Android? Is there a way to avoid building many separate deploy paths for various accelerators?

MV3 is supported by ExecuTorch (with XNNPack and HTP). ExecuTorch is our new end-to-end solution for enabling on-device AI across mobile and edge devices. The documentation is here, including tutorials: https://pytorch.org/executorch .

Please use Issues · pytorch/executorch · GitHub to report bugs, request for comments, enhancements and new features.