Why does an unsigned torch.quint8 tensor have a sign?

Andrew_Holmes · October 23, 2023, 3:14pm

I was implementing quantization and PyTorch and I noticed something that seemed off. Why does applying quantization on a tensor with the dtype torch.quint8 result in a quantized tensor that has a sign.

To reproduce:

import torch

x = torch.quantize_per_tensor(torch.tensor(
    [-1.0, 0.0, 1.0, 2.0]), 0.1, 10, torch.quint8)
print(x)

Output:

tensor([-1.,  0.,  1.,  2.], size=(4,), dtype=torch.quint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=10)

jcaip · October 26, 2023, 5:48am

Hi @Andrew_Holmes this is because the data you see represents the original (non-quantized) values.

Andrew_Holmes · October 26, 2023, 1:05pm

@jcaip Interesting. I’m aware that much of the quantization process is handled in C++, but isn’t there a Python representation for a quantized tensor? My aim is to construct a small framework utilizing some abstractions provided by PyTorch. In doing so, I’ve been finding it hard to understand how PyTorch performs some quantization operations. It’s been kinda challenging since the details of documentation isn’t the greatest.

Would you have any references on how the quantization process works on torch’s side? Specifically, I’m curious about how torch.quantize_per_tensor() applies the scale and zero_point. Is the formula x * s + z? Or is it x / s - z, etc.? I stumbled upon a brief article that seemingly outlines PyTorch’s approach to tensor quantization. Yet, without a way to view the resultant tensor, how can be sure the quantization parameters I’m passing are correct for my purposes?

jcaip · October 26, 2023, 6:29pm

For quantize_per_tensor, it’s the first formula (x * s + z)

Sorry I think I misunderstood your initial question. if you just want to see the quantized values, you should be able to call x.int_repr()

Andrew_Holmes · October 26, 2023, 8:28pm

@jcaip This helps a ton! Thanks a lot.

Andrew_Holmes · October 27, 2023, 1:08am

Hello, I think it’s actually applying the following: X/s + z. Here is why:

import torch
scale = 0.5
zero = 1

tensor = torch.tensor([[4, 6, 8]]).float()
q_tensor = torch.quantize_per_tensor(
    tensor, scale, zero, dtype=torch.qint8)
print(q_tensor.int_repr())

Output:
tensor([[ 9, 13, 17]], dtype=torch.int8)

If it was multiplicative it would be: tensor([[ 3, 4, 5]], dtype=torch.int8)

HDCharles · November 1, 2023, 4:24am

yep its in our docs in a few places e.g. torch.fake_quantize_per_tensor_affine — PyTorch 2.1 documentation