I am quite impressed by the current fx-based quantization workflow. However, the hardware I am using only supports TFLite-style quantization. Will the TFLite-style quantization format be supported in the future? Otherwise if I want to implement a new quantization backend for TFLite, is there any useful documentation that helps me understand the to-dos?
Unfortunately, we do not have any support for directly porting models from pytorch to tflite format. However, the underlying quantization support is very similar. We support quantization per-tensor or per-row, much like TFLite and have very similar operator support.