Questions about preparing QAT model

aichenaxx · June 1, 2023, 9:21am

i noticed that in the official tutorial, the author used torch.quantization.prepare_qat(model) to prepare a qat model. However, there are three steps in this function, including propagate_qconfig, convert and prepare. In step convert, module was converted into fakequant module, i think in this step, observer has already been add to this module. But in step prepare, they add observer to this module again. so why add observer to a fakequant module?

andrewor · June 5, 2023, 3:21pm

Hi aichenaxx,

In general, the prepare step inserts observers, the calibrate/train step populates the statistics in these observers, and the convert step replaces these observers with quant/dequant ops. This is true for both PTQ and QAT. I suppose in your question you’re asking about the implementation of prepare_qat specifically. The “convert” call inside prepare_qat is actually mostly for custom modules and is in no way related to the convert step I described above. It is just an implementation detail that reuses the same code as the convert step for swapping special modules as specified by the user, you can still think of the larger prepare_qat as a single prepare step that primarily inserts observers. Please let me know if anything’s not clear.

Best,
-Andrew

aichenaxx · June 7, 2023, 12:49pm

@andrewor Thanks a lot for your reply. I still can not understand “add observer” in this process. I think the “convert” call inside prepare_qat has transformed the original module to a module with “FakeQuantize” module, for example, nn.Linear==>nnqat.Linear. I think this nnqat.Linear module has its “observer”, and the “prepare” call inside prepare_qat also do “add_obverser”. I wonder after the whole prepare process, does this module has two “observer”, if so, why one observer is not enough?

andrewor · June 7, 2023, 3:08pm

Hi aichenaxx, in QAT the “observer” we insert is actually a FakeQuantize, but we use the same logic to insert them. Each FakeQuantize actually has an observer inside it, so when I say “insert observer” what I really mean is “insert observer or fake quantize.” The module swap you’re referring to (nn.Linear → nnqat.Linear) is not really related. It does add a weight fake quant through the nnqat.Linear module, but the input and output fake quants around the module are still inserted in the call to prepare, not convert, within prepare_qat.

aichenaxx · June 9, 2023, 7:30am

That really helps. Thanks a lot! @andrewor

rick1 · September 11, 2023, 6:51am

hello dear @andrewor
I also confused by this design. In preprare_qat_fx , a nn.op is firstly swapped to a nn.qat.op which return self.activation_post_process( self._conv_forward(input, self.weight_fake_quant(self.weight))) .
it seems it has a activation observer. But after that there has a insert_observers_for_model which add observer for activation.
This design may appear somewhat redundant, as comparing the scales calculated twice seems to yield only minor differences. Therefore, I would like to understand the reasons behind this design.

jerryzh168 · September 12, 2023, 5:17pm

we don’t observe the activation in nn.qat module I think, the activation observer is inserted in the parent model graph for the nn.qat module