I’m trying to convert model using this config
conf = QConfig(
activation=Quantizers.FakeQuantize.with_args(
observer=Observers.MovingAverageMinMaxObserver.with_args(
dtype=torch.qint8),
qscheme=torch.per_tensor_symmetric,
quant_min=-127,
quant_max=127,
dtype=torch.qint8
),
weight=Quantizers.FakeQuantize.with_args(
observer=Observers.MovingAveragePerChannelMinMaxObserver,
quant_min=-127,
quant_max=127,
dtype=torch.qint8,
qscheme=torch.per_channel_symmetric,
reduce_range=False,
ch_axis=0
))
It works for quint8 activations, but not for int8. I checked this readme pytorch/torch/ao/quantization/fx at master · pytorch/pytorch · GitHub and found this description of observers insertion.
QuantDeQuantStubs are inserted based on the qconfig_mapping
provided by users. Also we have a backend_config that specifies the configs that are supported by the backend. In this step, we will
- Check if
qconfig_mapping
is compatible withbackend_config
or not, if user requested a qconfig that is not compatible withbackend_config
, we’ll not insert observers for the operator, the config would just be ignored. - Insert observer for the input and output of the subgraph, based on the
qconfig_mapping
(what user requested) and thebackend_config
(how the operator should be observed in a backend).
After i checked if default backends configs support qint8 data type. Qnnpack should support qint8 feature maps because of following configuration in pytorch/qnnpack.py at master · pytorch/pytorch · GitHub :
qnnpack_act_qint8_scale_min_2_neg_12 = DTypeWithConstraints(
dtype=torch.qint8,
scale_min_lower_bound=2 ** -12,
)
qnnpack_weighted_op_qint8_symmetric_dtype_config = DTypeConfig(
input_dtype=qnnpack_act_qint8_scale_min_2_neg_12,
output_dtype=qnnpack_act_qint8_scale_min_2_neg_12,
weight_dtype=qnnpack_weight_qint8_neg_127_to_127_scale_min_2_neg_12,
bias_dtype=torch.float,
)
qnnpack_default_op_qint8_symmetric_dtype_config = DTypeConfig(
input_dtype=qnnpack_act_qint8_scale_min_2_neg_12,
output_dtype=qnnpack_act_qint8_scale_min_2_neg_12,
)
And this configs added into qnnpack backend config.
Also i set backend by command: torch.backends.quantized.engine=“qnnpack”
Why my model not converted by observers?