A range of quantization from FP32 to INT8, and its confirmation and change

As for quantization of a trained model, I suppose that we have to know its dinamic range (value range) in FP32 of a trained model so that we decide a proper range when the quantization to INT8 is applied to the trained model.

I guess… if the range of FP32 is extremly large, all feature (or feature map if it’s 2d) that we can extract as feature can become a certain one value (or a flat image if it’s 2d) . So, I’m curious on the quantization manipulation in Pytorch …

1). Is it possible to know what range of FP32 is quantized to INT8 when the quantization is applied to the trained model?

2). What is this range originated from? pixcel RGB values? or a combination of pixcel RGB values and convolution filter (karnel)? or other?

3). Ususaly, we use RGB images to train the model in classification task, a question I have is that RGB images provide (or require) the wider range of FP32 compared to, for instance, the situation that gray scale images are used for the training?

4). If I wanna make the original range of FP32 shorten (it leads/brings the wider range of INT8 when appling the quantization), are there any nice way to do so? (Use a gray image insted of a RGM image I mentioned above?)

First, please make sure to read through Quantization — PyTorch master documentation to get a high level understanding of what our tool can do.

  1. Yes. We insert observers for both activation and weight Tensors, you can take a look at attributes of observers to learn about the range at that point. e.g. observer.min_val, observer.max_val
  2. Depending on which Tensor you observe, we can be observing images or weights
  3. Range is based on calibration dataset, so if you provide a calibration dataset with representative data, we’ll be able to capture the range properly
  4. I think you can try HistorgramObserver to optimize for smallest quantization error, we typically use that to observe activations: pytorch/observer.py at master · pytorch/pytorch · GitHub
1 Like

Thank you for the very useful comments for me to understand manipulation of quantization in Pytorch, I’ll check information you gave me! (If I have further questions, I’ll ask it again.)

I tried this tutorial to undestand flow of quantization. The flow of the quantization is ok.

But I don’t have idea on how I use observer.min_val, observer.max_val to get quantization range… If you have hints to extract it, please advice me.

oh, you can checkout the code in observer.py: pytorch/observer.py at master · pytorch/pytorch · GitHub