As for quantization of a trained model, I suppose that we have to know its dinamic range (value range) in FP32 of a trained model so that we decide a proper range when the quantization to INT8 is applied to the trained model.
I guess… if the range of FP32 is extremly large, all feature (or feature map if it’s 2d) that we can extract as feature can become a certain one value (or a flat image if it’s 2d) . So, I’m curious on the quantization manipulation in Pytorch …
1). Is it possible to know what range of FP32 is quantized to INT8 when the quantization is applied to the trained model?
2). What is this range originated from? pixcel RGB values? or a combination of pixcel RGB values and convolution filter (karnel)? or other?
3). Ususaly, we use RGB images to train the model in classification task, a question I have is that RGB images provide (or require) the wider range of FP32 compared to, for instance, the situation that gray scale images are used for the training?
4). If I wanna make the original range of FP32 shorten (it leads/brings the wider range of INT8 when appling the quantization), are there any nice way to do so? (Use a gray image insted of a RGM image I mentioned above?)
Yes. We insert observers for both activation and weight Tensors, you can take a look at attributes of observers to learn about the range at that point. e.g. observer.min_val, observer.max_val
Depending on which Tensor you observe, we can be observing images or weights
Range is based on calibration dataset, so if you provide a calibration dataset with representative data, we’ll be able to capture the range properly
Thank you for the very useful comments for me to understand manipulation of quantization in Pytorch, I’ll check information you gave me! (If I have further questions, I’ll ask it again.)