How does fake_quantize_per tensor works?

I was wondering what operation occurs in this program. Does this program does rounding with STE or without STE.

                X = torch.fake_quantize_per_tensor_affine(
                    X, self.scale.item(), int(self.zero_point.item()), self.quant_min, self.quant_max)

The PyTorch Quantization Process is the same of from this github repo outlier_suppression/util_quant.py at ae5b5b48e781cd55631128d8ecd746198e6839e4 · wimh966/outlier_suppression · GitHub

def fake_quantize_per_tensor_affine(x, scale, zero_point, quant_min, quant_max):
    x_int = round_ste(x / scale) + zero_point
    x_quant = torch.clamp(x_int, quant_min, quant_max)
    x_dequant = (x_quant - zero_point) * scale
    return x_dequant

I am unclear. How does Pytorch do?

Hi @Rami_Ismael , yes, STE is used in the derivative of fake_quantize_per_tensor_affine.