Can quantized model be exported and used?

Shani_Gamrian · June 23, 2020, 9:37am

Is it possible to transform the quantization ability to Caffe? Let’s say I created a quantized model using PyTorch and now I want to export the model to Caffe, can I do that by using the scale/zero_point parameters or it’s mandatory to use PyTorch for their quantization?

jerryzh168 · June 23, 2020, 4:10pm

you can take a look at ONNX, but we don’t have very good quantization support in ONNX right now, I’m not sure about the ONNX - caffe path either.

Shani_Gamrian · June 24, 2020, 7:43am

Is the quantization done once and then can be used (with the scale and zero_point) or it should have special support that make it int8 during inference?

jerryzh168 · June 24, 2020, 7:33pm

quantization is done before inference, it transforms a floating point model to a quantized model.