i noticed that in the official tutorial, the author used torch.quantization.prepare_qat(model) to prepare a qat model. However, there are three steps in this function, including propagate_qconfig, convert and prepare. In step convert, module was converted into fakequant module, i think in this step, observer has already been add to this module. But in step prepare, they add observer to this module again. so why add observer to a fakequant module?
In general, the prepare step inserts observers, the calibrate/train step populates the statistics in these observers, and the convert step replaces these observers with quant/dequant ops. This is true for both PTQ and QAT. I suppose in your question you’re asking about the implementation of
prepare_qat specifically. The “convert” call inside
prepare_qat is actually mostly for custom modules and is in no way related to the convert step I described above. It is just an implementation detail that reuses the same code as the convert step for swapping special modules as specified by the user, you can still think of the larger
prepare_qat as a single prepare step that primarily inserts observers. Please let me know if anything’s not clear.
@andrewor Thanks a lot for your reply. I still can not understand “add observer” in this process. I think the “convert” call inside prepare_qat has transformed the original module to a module with “FakeQuantize” module, for example, nn.Linear==>nnqat.Linear. I think this nnqat.Linear module has its “observer”, and the “prepare” call inside prepare_qat also do “add_obverser”. I wonder after the whole prepare process, does this module has two “observer”, if so, why one observer is not enough?
Hi aichenaxx, in QAT the “observer” we insert is actually a FakeQuantize, but we use the same logic to insert them. Each FakeQuantize actually has an observer inside it, so when I say “insert observer” what I really mean is “insert observer or fake quantize.” The module swap you’re referring to (nn.Linear → nnqat.Linear) is not really related. It does add a weight fake quant through the nnqat.Linear module, but the input and output fake quants around the module are still inserted in the call to
That really helps. Thanks a lot! @andrewor
hello dear @andrewor
I also confused by this design. In
preprare_qat_fx , a nn.op is firstly swapped to a nn.qat.op which return
self.activation_post_process( self._conv_forward(input, self.weight_fake_quant(self.weight))) .
it seems it has a activation observer. But after that there has a
insert_observers_for_model which add observer for activation.
This design may appear somewhat redundant, as comparing the scales calculated twice seems to yield only minor differences. Therefore, I would like to understand the reasons behind this design.
we don’t observe the activation in nn.qat module I think, the activation observer is inserted in the parent model graph for the nn.qat module