Hello,
why do i have to choose the activation quantization of QAT to be quint8 ? Is there a way to get passed that ?
I used the qconfig of one of the people here from the forum:
activation_bitwidth = 8 #whatever bit you want
bitwidth = 8 #whatever bit you want
fq_activation = torch.quantization.FakeQuantize.with_args(observer=torch.quantization.MinMaxObserver.with_args(
quant_min=-(2 ** bitwidth) // 2,
quant_max=(2 ** bitwidth) // 2 - 1,
dtype=torch.qint8,
qscheme=torch.per_tensor_symmetric,
reduce_range=False,))
fq_weights = torch.quantization.FakeQuantize.with_args(
observer = torch.quantization.MinMaxObserver.with_args(
quant_min=-(2 ** bitwidth) // 2,
quant_max=(2 ** bitwidth) // 2 - 1,
dtype=torch.qint8,
qscheme=torch.per_tensor_symmetric,
reduce_range=False,))
intB_qat_qconfig = torch.quantization.QConfig(activation= fq_activation,weight = fq_weights)