Cant convert model using qnnpack

CRPS · October 25, 2022, 3:48pm

I’m trying to convert model using this config

conf = QConfig(
activation=Quantizers.FakeQuantize.with_args(
observer=Observers.MovingAverageMinMaxObserver.with_args(
dtype=torch.qint8),
qscheme=torch.per_tensor_symmetric,
quant_min=-127,
quant_max=127,
dtype=torch.qint8

    ),
    weight=Quantizers.FakeQuantize.with_args(
        observer=Observers.MovingAveragePerChannelMinMaxObserver,
        quant_min=-127,
        quant_max=127,
        dtype=torch.qint8,
        qscheme=torch.per_channel_symmetric,
        reduce_range=False,
        ch_axis=0
    ))

It works for quint8 activations, but not for int8. I checked this readme pytorch/torch/ao/quantization/fx at master · pytorch/pytorch · GitHub and found this description of observers insertion.

QuantDeQuantStubs are inserted based on the qconfig_mapping provided by users. Also we have a backend_config that specifies the configs that are supported by the backend. In this step, we will

Check if qconfig_mapping is compatible with backend_config or not, if user requested a qconfig that is not compatible with backend_config, we’ll not insert observers for the operator, the config would just be ignored.
Insert observer for the input and output of the subgraph, based on the qconfig_mapping (what user requested) and the backend_config (how the operator should be observed in a backend).

After i checked if default backends configs support qint8 data type. Qnnpack should support qint8 feature maps because of following configuration in pytorch/qnnpack.py at master · pytorch/pytorch · GitHub :

qnnpack_act_qint8_scale_min_2_neg_12 = DTypeWithConstraints(
dtype=torch.qint8,
scale_min_lower_bound=2 ** -12,
)

qnnpack_weighted_op_qint8_symmetric_dtype_config = DTypeConfig(
input_dtype=qnnpack_act_qint8_scale_min_2_neg_12,
output_dtype=qnnpack_act_qint8_scale_min_2_neg_12,
weight_dtype=qnnpack_weight_qint8_neg_127_to_127_scale_min_2_neg_12,
bias_dtype=torch.float,
)

qnnpack_default_op_qint8_symmetric_dtype_config = DTypeConfig(
input_dtype=qnnpack_act_qint8_scale_min_2_neg_12,
output_dtype=qnnpack_act_qint8_scale_min_2_neg_12,
)

And this configs added into qnnpack backend config.

Also i set backend by command: torch.backends.quantized.engine=“qnnpack”

Why my model not converted by observers?

Vasiliy_Kuznetsov · October 26, 2022, 5:03pm

Do you have a e2e reproducible example you could share? High level, what you are trying to do should work, but it’s hard to say why it’s not working without some more context.