When preparing the model for QAT, why do we need to keep the input and output tensor share the same quantization parameter as described here
Have figured it out:)
For others stumbling across this, the answer is because that quantization handler is only for a specific set of functions whose output ranges can be inferred from their input ranges, i.e. ReLU’s output range will always be from 0 to the max of the input range.