Using quantizable model for normal training

Hi, by “quantizable” I suppose you mean the ones in torchvision specifically. These are needed for eager mode quantization, since the user needs to manually insert QuantStubs and DeQuantStubs like you mentioned. For QAT, these will be replaced by FakeQuantizes, which actually do change the numerics of training. That’s the point of QAT in the first place, which is to improve the accuracy of quantization by making the training process “aware” that the model will ultimately be quantized.

So my recommendation is the following. Either make “quantizable” versions of your model similar to torchvision, which uses eager mode quantization, or switch to FX graph mode quantization, where you don’t have to change a thing about your model and it’ll still be quantized automatically (with FakeQuantizes inserted for the QAT case). You can learn more about FX graph mode quantization here: (prototype) FX Graph Mode Quantization User Guide — PyTorch Tutorials 2.0.1+cu117 documentation. Please feel free to let me know if there’s anything else I can clarify.

Best,
-Andrew