I was wondering what operation occurs in this program. Does this program does rounding with STE or without STE.
X = torch.fake_quantize_per_tensor_affine(
X, self.scale.item(), int(self.zero_point.item()), self.quant_min, self.quant_max)
The PyTorch Quantization Process is the same of from this github repo outlier_suppression/util_quant.py at ae5b5b48e781cd55631128d8ecd746198e6839e4 · wimh966/outlier_suppression · GitHub
def fake_quantize_per_tensor_affine(x, scale, zero_point, quant_min, quant_max):
x_int = round_ste(x / scale) + zero_point
x_quant = torch.clamp(x_int, quant_min, quant_max)
x_dequant = (x_quant - zero_point) * scale
return x_dequant
I am unclear. How does Pytorch do?