I’m trying to have a more limited range for values in my model. I set quant_max and quant_min to the new values as an argument in FakeQuantize, but when I actually print int_repr() of values, it’s still between 0-255 (or -128 to 127).
right now the observer uses fixed range depending on dtype: https://github.com/pytorch/pytorch/blob/master/torch/quantization/observer.py#L153-L162, feel free to add the support for quant_min and quant_max for observer.
cc @raghuramank100 for sub 8 bit observer support.
Yes, we have a PR in the works for supporting sub 8 bit quantization at: https://github.com/pytorch/pytorch/pull/33743, expect to land this end of next week