With a quantized model, it’s necessary to set the correct backend (fbgemm or qnnpack) for inference.
But in the quantization aware training, does this backend affect the training?
For instance, can I train the quantized model using fbgemm backend and then use it with the qnnpack in the inference phase!
Hi @eefahd ,
There are a couple of things to keep in mind:
- default qconfigs have different settings for
fbgemm. One setting in particular,
reduce_range, if set to False only works correctly in
qnnpack and leads to potential overflow in
- when weights are packed, the global backend setting is used to determine whether to pack for
fbgemm or for