I’m new to this topic.Plz provide the clear insight on the following questions.
What is the difference b/w symmetric and asymmetric quantization?
How to choose the suitable scheme for our model? Does that depend on the weights or on the quantization dtype?
This whitepaper was made by one of the pytorch quantization team members and informs a lot of the implementation.
it shows how symmetric quantization is essentially just when the zero point is set to 0. Note the signed vs unsigned implementation depends on the dtype i.e. quint8 vs qint8.
The best information we have in the documentation is:
which is not great, I created an issue for this here:
Also if you want a code definition, note that symmetric is generally handled as a special case of affine quantization and all that happens is the way qparams are calculated are different. Here is where that happens: