How to set quantization aware training scaling factors?

sunkr1995 · January 6, 2020, 5:58am

when i use quantization aware training , The weight tensor scaling factors is a standard floating point number.
I want to convert my model as 8bit at FPGA, so the weight tensor scaling factor must be an integer power-of-two value exponent. Is there such an option? what should I do

robotcator123 · January 9, 2020, 5:03am

It seems that the quantization scheme is a little bit different. You can see from this https://github.com/pytorch/pytorch/wiki/torch_quantization_design_proposal

dskhudia · January 13, 2020, 7:57pm

Depending on the fixed-point arithmetic you use, you can convert float multiplier to quantized_multiplier (integer) and right shift (integer). Please checkout https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtils.cc#L107-L157

bigtree · November 9, 2021, 7:55pm

@sunkr1995
I am facing the same issue. Did you find a way to do that?
Thanks

aimen123 · August 16, 2022, 7:03am

do you find a way to do that?

aimen123 · August 16, 2022, 7:05am

do you have a good way to set scale —> 2^n

jerryzh168 · August 25, 2022, 11:23pm

you’ll need to implement your own fake quantize module: pytorch/fake_quantize.py at master · pytorch/pytorch · GitHub to restrict the scaling factor to power of two, we had an intern recently implemented additive power of two actually: pytorch/fake_quantize.py at master · pytorch/pytorch · GitHub,
the code for using it in the flow can be found in pytorch/apot_fx_graph_mode_qat.py at master · pytorch/pytorch · GitHub
paper: [1909.13144] Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

for converting the model to 8bit FPGA, I think you might need to follow the reference flow, which is only available in fx graph mode quantization right now, please take a look at rfcs/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md at master · pytorch/rfcs · GitHub, you will get a model with q/dq/fp32 ops which represents a quantized model, and you can lower the model to FPGA (I guess you need to expose ops implemented in FPGA in pytorch?) lowering code for native pytorch backend (fbgemm/qnnpack) can be found in pytorch/_lower_to_native_backend.py at master · pytorch/pytorch · GitHub