About Quantization schemes

Himajyothi_Rajamahen · November 17, 2021, 7:25am

Hi,

I’m new to this topic.Plz provide the clear insight on the following questions.

What is the difference b/w symmetric and asymmetric quantization?
How to choose the suitable scheme for our model? Does that depend on the weights or on the quantization dtype?

Thanks

ZimoNitrome · November 17, 2021, 10:57am

Tensorflow has a good section on this: Spesifikasi kuantisasi TensorFlow Lite 8-bit

HDCharles · November 17, 2021, 9:17pm

This whitepaper was made by one of the pytorch quantization team members and informs a lot of the implementation.

it shows how symmetric quantization is essentially just when the zero point is set to 0. Note the signed vs unsigned implementation depends on the dtype i.e. quint8 vs qint8.

The best information we have in the documentation is:

https://pytorch.org/docs/stable/torch.quantization.html#torch-quantization

which is not great, I created an issue for this here:

github.com/pytorch/pytorch

Improve documentation about meaning of different qschemes

opened 09:13PM - 17 Nov 21 UTC

HDCharles

oncall: quantization

What exactly is symmetric and affine quantization? The best doucmentation is …kind of a side point here: https://pytorch.org/docs/stable/torch.quantization.html?highlight=torch%20minmaxobserver#torch.quantization.MinMaxObserver there should be a clearer explanation for exactly what symmetric and affine quantization means rather than having to go backwards from the calculation of the qparams cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo

Also if you want a code definition, note that symmetric is generally handled as a special case of affine quantization and all that happens is the way qparams are calculated are different. Here is where that happens:

github.com

pytorch/pytorch/blob/b0bdf588ea575928a94264c30999385d5ff2bc32/torch/ao/quantization/observer.py#L283

    
      
              return torch.tensor([1.0], device=min_val.device.type), torch.tensor([0], device=min_val.device.type)
          
          
quant_min, quant_max = self.quant_min, self.quant_max
          min_val_neg = torch.min(min_val, torch.zeros_like(min_val))
          max_val_pos = torch.max(max_val, torch.zeros_like(max_val))
          
          
device = min_val_neg.device
          scale = torch.ones(min_val_neg.size(), dtype=torch.float32, device=device)
          zero_point = torch.zeros(min_val_neg.size(), dtype=torch.int64, device=device)
          
          
if (
              self.qscheme == torch.per_tensor_symmetric
              or self.qscheme == torch.per_channel_symmetric
          ):
              max_val_pos = torch.max(-min_val_neg, max_val_pos)
              scale = max_val_pos / (float(quant_max - quant_min) / 2)
              scale = torch.max(scale, self.eps)
              if self.dtype == torch.quint8:
                  if self.has_customized_qrange:
                      # When customized quantization range is used, down-rounded midpoint of the range is chosen.
                      zero_point = zero_point.new_full(