I was implementing quantization and PyTorch and I noticed something that seemed off. Why does applying quantization on a tensor with the dtype torch.quint8 result in a quantized tensor that has a sign.

@jcaip Interesting. I’m aware that much of the quantization process is handled in C++, but isn’t there a Python representation for a quantized tensor? My aim is to construct a small framework utilizing some abstractions provided by PyTorch. In doing so, I’ve been finding it hard to understand how PyTorch performs some quantization operations. It’s been kinda challenging since the details of documentation isn’t the greatest.

Would you have any references on how the quantization process works on torch’s side? Specifically, I’m curious about how torch.quantize_per_tensor() applies the scale and zero_point. Is the formula x * s + z? Or is it x / s - z, etc.? I stumbled upon a brief article that seemingly outlines PyTorch’s approach to tensor quantization. Yet, without a way to view the resultant tensor, how can be sure the quantization parameters I’m passing are correct for my purposes?