Hi all,
I apologize if this question is covered elsewhere.
I would like to perform quantization-aware training, but with the model initialized according to the pre-trained, post-training-quantized quantization parameters (e.g., a torchvision quantized model with layers initialized with the same scale, zero_point, etc. as in the pre-trained quantization model that is initialized with quantize=True
).
That is, I’d like the initial model used for QAT to produce the same output as a pre-trained model that has been quantized using a post-training method (e.g., static quantization).
Is there an easy way to achieve this? I had tried hacking manually setting some of the QAT model’s FakeQuantizer parameters, but was unable to get it working properly.
I appreciate any help! Please let me know if my question is unclear and I will rephrase it.
Thanks!