Qlinear (ONEDNN): data type of input should be QUint8

fbernhar · November 30, 2023, 11:05am

Hello,

why do i have to choose the activation quantization of QAT to be quint8 ? Is there a way to get passed that ?

I used the qconfig of one of the people here from the forum:
activation_bitwidth = 8 #whatever bit you want
bitwidth = 8 #whatever bit you want

fq_activation = torch.quantization.FakeQuantize.with_args(observer=torch.quantization.MinMaxObserver.with_args(
quant_min=-(2 ** bitwidth) // 2,
quant_max=(2 ** bitwidth) // 2 - 1,
dtype=torch.qint8,
qscheme=torch.per_tensor_symmetric,
reduce_range=False,))

fq_weights = torch.quantization.FakeQuantize.with_args(
observer = torch.quantization.MinMaxObserver.with_args(
quant_min=-(2 ** bitwidth) // 2,
quant_max=(2 ** bitwidth) // 2 - 1,
dtype=torch.qint8,
qscheme=torch.per_tensor_symmetric,
reduce_range=False,))

intB_qat_qconfig = torch.quantization.QConfig(activation= fq_activation,weight = fq_weights)

jerryzh168 · December 15, 2023, 10:20pm

this is a limitation of the backend (quantized engine) you are lowering to I think, do you want qint8 instead? you can try using XNNPACK instead. if you set torch.backends.quantized.engine = "qnnpack" you should be able to run the model that’s quantized with qint8