Quantization of a single tensor

kaoutar · October 6, 2023, 10:55am

please, how to perform static quantization on a single tensor ? i want the steps to go from a float tensor to a quantized tensor (no math just pytorch code)

jerryzh168 · October 6, 2023, 9:31pm

do you have any larger context on what you are trying to do? we have a lot of quantize/dequantize ops that you can call but they may produce different tensors. e.g.

torch.quantize_per_tensor (torch.quantize_per_tensor — PyTorch 2.1 documentation):

the old quantize op that gives you a quantized tensor in pytorch with quint8/qint8 etc. dtypes.

we are moving away from the above and wants to use native pytorch integer tensor directly:

torch.ops.quantized_decomposed.quantize_per_tensor (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/fx/_decomposed.py#L37)

This returns uint8/int8 pytorch tensor. and it doesn’t store scale/zero_point in the tensor itself.

kaoutar · October 7, 2023, 8:56am

@jerryzh168 Thank you for replying, in fact i am far from model quantization and all this hassle of memory issues,

i have a ML task, the target is of float type, i want to do some experiments, one of them is convert the target to integer type and use LLM to predict an integer instead of a float, what i am looking for, is a function that helps me map float numbers to integers(quantize), and obviously another function to dequantize.

i’ve tried to read the quantization DOC on pytorch website, but i couldn’t , i was overwhelmed by the amount of text and details all talking about torch module.

any simple guide on how to do all types of quantization of a single tensor including how to find the best scale factor (i see there are a lot of techniques)

jerryzh168 · October 12, 2023, 10:59pm

quantization by itself is relatively straightforward, you can also just define your own quantize op: tensor / scale + zero_point.

maybe you can use some of our ops used in pytorch 2 export quantization: https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/fx/_decomposed.py
tests:
https://github.com/pytorch/pytorch/blob/main/test/quantization/core/test_quantized_tensor.py#L1463

to choose quantization parameters, you can also just call: https://github.com/pytorch/pytorch/blob/main/test/quantization/core/test_quantized_tensor.py#L1515