A range of quantization from FP32 to INT8, and its confirmation and change

timosy · June 30, 2022, 3:50pm

As for quantization of a trained model, I suppose that we have to know its dinamic range (value range) in FP32 of a trained model so that we decide a proper range when the quantization to INT8 is applied to the trained model.

I guess… if the range of FP32 is extremly large, all feature (or feature map if it’s 2d) that we can extract as feature can become a certain one value (or a flat image if it’s 2d) . So, I’m curious on the quantization manipulation in Pytorch …

1). Is it possible to know what range of FP32 is quantized to INT8 when the quantization is applied to the trained model?

2). What is this range originated from? pixcel RGB values? or a combination of pixcel RGB values and convolution filter (karnel)? or other?

3). Ususaly, we use RGB images to train the model in classification task, a question I have is that RGB images provide (or require) the wider range of FP32 compared to, for instance, the situation that gray scale images are used for the training?

4). If I wanna make the original range of FP32 shorten (it leads/brings the wider range of INT8 when appling the quantization), are there any nice way to do so? (Use a gray image insted of a RGM image I mentioned above?)

jerryzh168 · July 1, 2022, 12:36am

First, please make sure to read through Quantization — PyTorch master documentation to get a high level understanding of what our tool can do.

Yes. We insert observers for both activation and weight Tensors, you can take a look at attributes of observers to learn about the range at that point. e.g. observer.min_val, observer.max_val
Depending on which Tensor you observe, we can be observing images or weights
Range is based on calibration dataset, so if you provide a calibration dataset with representative data, we’ll be able to capture the range properly
I think you can try HistorgramObserver to optimize for smallest quantization error, we typically use that to observe activations: pytorch/observer.py at master · pytorch/pytorch · GitHub

timosy · July 1, 2022, 1:48am

Thank you for the very useful comments for me to understand manipulation of quantization in Pytorch, I’ll check information you gave me! (If I have further questions, I’ll ask it again.)

timosy · July 2, 2022, 9:34am

I tried this tutorial to undestand flow of quantization. The flow of the quantization is ok.
https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html

But I don’t have idea on how I use observer.min_val, observer.max_val to get quantization range… If you have hints to extract it, please advice me.

jerryzh168 · July 13, 2022, 5:18pm

oh, you can checkout the code in observer.py: pytorch/observer.py at master · pytorch/pytorch · GitHub