How to freeze the FakeQuantize zero_point during train

I need to train a quantized model which has 0 offset due to limitations of my inference framework.
I’m following the flow described in https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html
So the model is prepared with prepare_qat which adds FakeQuantize layers
The problem is that both scale and zero_point are being trained. I need the zero_point to be fixed at 0.

The scale and zero_point aren’t trained - they are calculated by observers inserted in the network. You can implement an observer specific to your use-case which will fix the zero_point at 0. For reference the zero_point calculation happens in https://github.com/pytorch/pytorch/blob/master/torch/quantization/observer.py#L187
Observers are set when you initialize the qconfig (in this case you seem to be using the default. i.e. https://github.com/pytorch/pytorch/blob/master/torch/quantization/qconfig.py#L90

1 Like

Thanks for your input.

Do you know a good example of applying custom observers ?
qconfig = QConfig(activation=FakeQuantize.with_args(observer=,
quant_min=0,
quant_max=255,
reduce_range=True),
weight=default_per_channel_weight_fake_quant)
Is this enough or setting a custom observer or there are some nuances ?

You can follow any of the observers defined in [https://github.com/pytorch/pytorch/blob/master/torch/quantization/observer.py] as a starting point.

To enable it in the qconfig you can do
FakeQuantize.with_args(observer=MyObserver, quant_min=0, quant_max=255, dtype=torch.qint8, qscheme=torch.per_tensor_affine)

1 Like