Why does an unsigned torch.quint8 tensor have a sign?

I was implementing quantization and PyTorch and I noticed something that seemed off. Why does applying quantization on a tensor with the dtype torch.quint8 result in a quantized tensor that has a sign.

To reproduce:

import torch

x = torch.quantize_per_tensor(torch.tensor(
    [-1.0, 0.0, 1.0, 2.0]), 0.1, 10, torch.quint8)
print(x)

Output:

tensor([-1.,  0.,  1.,  2.], size=(4,), dtype=torch.quint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=10)

Hi @Andrew_Holmes this is because the data you see represents the original (non-quantized) values.

@jcaip Interesting. I’m aware that much of the quantization process is handled in C++, but isn’t there a Python representation for a quantized tensor? My aim is to construct a small framework utilizing some abstractions provided by PyTorch. In doing so, I’ve been finding it hard to understand how PyTorch performs some quantization operations. It’s been kinda challenging since the details of documentation isn’t the greatest.

Would you have any references on how the quantization process works on torch’s side? Specifically, I’m curious about how torch.quantize_per_tensor() applies the scale and zero_point. Is the formula x * s + z? Or is it x / s - z, etc.? I stumbled upon a brief article that seemingly outlines PyTorch’s approach to tensor quantization. Yet, without a way to view the resultant tensor, how can be sure the quantization parameters I’m passing are correct for my purposes?

For quantize_per_tensor, it’s the first formula (x * s + z)

Sorry I think I misunderstood your initial question. if you just want to see the quantized values, you should be able to call x.int_repr()

1 Like

@jcaip This helps a ton! Thanks a lot.

Hello, I think it’s actually applying the following: X/s + z. Here is why:

import torch
scale = 0.5
zero = 1

tensor = torch.tensor([[4, 6, 8]]).float()
q_tensor = torch.quantize_per_tensor(
    tensor, scale, zero, dtype=torch.qint8)
print(q_tensor.int_repr())

Output:
tensor([[ 9, 13, 17]], dtype=torch.int8)

If it was multiplicative it would be: tensor([[ 3, 4, 5]], dtype=torch.int8)

yep its in our docs in a few places e.g. torch.fake_quantize_per_tensor_affine — PyTorch 2.1 documentation