please, how to perform static quantization on a single tensor ? i want the steps to go from a float tensor to a quantized tensor (no math just pytorch code)
do you have any larger context on what you are trying to do? we have a lot of quantize/dequantize ops that you can call but they may produce different tensors. e.g.
torch.quantize_per_tensor (torch.quantize_per_tensor — PyTorch 2.1 documentation):
the old quantize op that gives you a quantized tensor in pytorch with quint8/qint8 etc. dtypes.
we are moving away from the above and wants to use native pytorch integer tensor directly:
This returns uint8/int8 pytorch tensor. and it doesn’t store scale/zero_point in the tensor itself.
@jerryzh168 Thank you for replying, in fact i am far from model quantization and all this hassle of memory issues,
i have a ML task, the target is of float type, i want to do some experiments, one of them is convert the target to integer type and use LLM to predict an integer instead of a float, what i am looking for, is a function that helps me map float numbers to integers(quantize), and obviously another function to dequantize.
i’ve tried to read the quantization DOC on pytorch website, but i couldn’t , i was overwhelmed by the amount of text and details all talking about torch module.
any simple guide on how to do all types of quantization of a single tensor including how to find the best scale factor (i see there are a lot of techniques)
quantization by itself is relatively straightforward, you can also just define your own quantize op: tensor / scale + zero_point.
maybe you can use some of our ops used in pytorch 2 export quantization: https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/fx/_decomposed.py
to choose quantization parameters, you can also just call: https://github.com/pytorch/pytorch/blob/main/test/quantization/core/test_quantized_tensor.py#L1515