I am a student who studies AI accelerators.
We are doing related research by performing Pytorch’s Eager mode Quantization.
As a result of the output, it was confirmed that quantization was made for the weight and activation of Qint8, but the bias is not.
I have a few questions, please.
- Doesn’t Bias usually Quantize to INT8 in Quantization model?
2.Isn’t it recommended to run Eager mode to experiment with the Quantization model for INT8? Usually, it seems to be customized, but I’m not very good at coding.
-
If you want to experiment with INT8 quantization, it is said that QINT8 is decoded with FP32 and then calculated. If you want to study only with INT8, what method should I use? (Eager mode, FX, ONNX or Tensorflow…
-
Also, most backend(fbgem or quantization backend) doesn’t support INT8 operation well, is there a way to support it well?
Please answer what you know.