Will Pytorch support exporting quantized model?

I see that pytorch use onnx internally to transport quantized model(using pytorch quantization api) to caffe2. And I can export this internal quantized model representation. Like below:

As we can see, all operators in the model are custom op which can directly transport to caffe2, but this is not that flexible for using this quantized model as exchange format.

AFAIK, Tensorflow can export QAT model that contains FakeQuant Op, and transport the model to TFLite. In my opinion, we can export a quantized model that only contains FakeQuant Op(in ONNX Custom Op) and Standard ONNX Ops. This make the quantized model more flexible.

What’s your opinion about it? Thanks.


For qat we have fake quant ops, but for inference we don’t, we have quant/dequant ops instead.
If you are talking about preserving quant/dequant op without fusing them into quantized ops(like quantized::conv2d), we do have this support in graph mode quantization, which will come up pretty soon