Is it possible to transform the quantization ability to Caffe? Let’s say I created a quantized model using PyTorch and now I want to export the model to Caffe, can I do that by using the scale/zero_point parameters or it’s mandatory to use PyTorch for their quantization?
you can take a look at ONNX, but we don’t have very good quantization support in ONNX right now, I’m not sure about the ONNX - caffe path either.
Is the quantization done once and then can be used (with the scale and zero_point) or it should have special support that make it int8 during inference?
quantization is done before inference, it transforms a floating point model to a quantized model.