Dear team & community,
Can anyone describe their workflow to bring quantized models onto devices (mobile, edge devices, etc.)?
- AFAIU the quantization API does not support exporting onto ONNX.
2. What do people often do then?
- Currently, the road feels incomplete. no?
- 3a. Where can we read the roadmap?
- 3b. What’s missing to achieve something similar to (e.g., TFLite) in Pytorch?
- 3c. Can you please mention the existing players (raising starts , and partners) such that people can better decide how to organize their projects?
Thanks in advance & keep it up!
P.D. Take my words with a dose of honey, and fast-fwd criticism. I believe AI will be ubiquitous as long as it’s accessible in users’ pockets (e.g., mobile devices & owned hardware), not just done in big cluster farms (e.g., via APIs). Thus, I truly appreciate that Pytorch opened up space for quantization.
How about exporting the model into
ONNX first and then quantizing the
ONNX model by Onnxruntime.
Thanks for chiming in @111357
I read somewhere on the web (Forums, Github, etc.), your suggested path. Feel free to provide more details about your experience (e.g.,. linking a Github project or mentioning the models) walking along that trail
For the current model (YOLOv5-x), I’m working on people suggested the following pipeline:
pytorch → onnx → SNPE/DLC (Qualcomm specific SDK & hardware)
Thus, I’m doing most of the development in Pytorch (ML pipeline & Quantization biz).
BTW, a colleague also suggested cutting any middleman, i.e., jumping directly to hardware SDK (DLC in my case). I didn’t as it’s my first venturing in that frontier.
Hi, I recommend the AIMET which is a tool supported by Qualcomm.
Here is an example that quantized a torch model
hi there, we have a recipe for how to run quantized models on device https://pytorch.org/tutorials/recipes/ptmobile_recipes_summary.html
We are working on building an executor stack for exported PyTorch programs to run on device. The release plan to OSS is still in progress, but expect to hear something later this year. cc @Martin_Yuan, @raziel
We’re building a new on-device stack within PyTorch, and quantization is a core concern just as is building the ecosystem to target the heterogeneous hardware landscape.
Right now I cannot share a lot more than what we presented last December in the PyTorch Conference: https://youtu.be/XJJzJbDEAic
Thanks! I’ll definitely take a look at AIMET in the future.
Have you used the
QuantizationSimModel.export, mentioned at the end of QAT tutorial?
- Did you manage to export a useful ONNX file?
- How did your pipeline with SNPE (Snapdragon SDK) look-alike with the quantized onnx file?
Apparently, SNPE only generates DLC files out of float32 models. I won’t be surprised if it’s just a matter of casting weights to float32.
Here are some tutorials for the new quantization API:
and PyTorch mobile will also release executorch (native pytorch platform/runtime for edge models) in PyTorch conference: Schedule | Linux Foundation Events, it’s 3 weeks away.