My board supported tflite model (it is mandatory) for classification task. If I develops with Tensorflow, so deployment tfite on edge devices is straightforward: Tensorflow model (.h5) ==> Quantize and save to tflite mode. Tensorflow supprort many quantization techniques and it is easy to use.
But now, community widely use Pytorch as Framework for Deep Learning, and I want to develop with Pytorch to get better float32 model. I have read in forums that workflow of deployment tfite on edge devices is as follow
.pt model ==> .onnx model => .tflite model
, I call it as flow 2.
I have a question. With Pytorch flow, where can I apply quantization (PTQ or QAT) in flow 2 to get better performance of tflite model?
Thank you so much.