Per-tensor quantization in quantization-aware training

The tutorial here provides an example of per-channel quantization training.

In my case I need to perform per-tensor quantization since the downstream mobile-device inference library (e.g. TNN) does not support per-channel quantized models.

I think the problem here is how to setup a per-tensor quantization around:

model.qconfig = torch.quantization.get_default_qat_qconfig("fbgemm")

Currently this part is not extensively documented and I cannot find many resources.

So could someone give an example configuration for per-tensor quantization?

Hi @kaizhao ,

The qnnpack backend has default settings with per-Tensor observers. You can create a config with this setting like this:

model.qconfig = torch.quantization.get_default_qat_qconfig("qnnpack")

If you’d like to customize the qconfig manually, you could take a look here: pytorch/qconfig.py at master · pytorch/pytorch · GitHub

You can change just the per-channel setting with something like this:

        qconfig = QConfig(activation=FakeQuantize.with_args(observer=MovingAverageMinMaxObserver,
                                                            quant_min=0,
                                                            quant_max=255,
                                                            reduce_range=True),
                          weight=default_weight_fake_quant)
                          # weight=default_per_channel_weight_fake_quant)