How to set quantization aware training scaling factors?

you’ll need to implement your own fake quantize module: pytorch/fake_quantize.py at master · pytorch/pytorch · GitHub to restrict the scaling factor to power of two, we had an intern recently implemented additive power of two actually: pytorch/fake_quantize.py at master · pytorch/pytorch · GitHub,
the code for using it in the flow can be found in pytorch/apot_fx_graph_mode_qat.py at master · pytorch/pytorch · GitHub
paper: [1909.13144] Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

for converting the model to 8bit FPGA, I think you might need to follow the reference flow, which is only available in fx graph mode quantization right now, please take a look at rfcs/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md at master · pytorch/rfcs · GitHub, you will get a model with q/dq/fp32 ops which represents a quantized model, and you can lower the model to FPGA (I guess you need to expose ops implemented in FPGA in pytorch?) lowering code for native pytorch backend (fbgemm/qnnpack) can be found in pytorch/_lower_to_native_backend.py at master · pytorch/pytorch · GitHub

1 Like