How to set quantization aware training scaling factors?

when i use quantization aware training , The weight tensor scaling factors is a standard floating point number.
I want to convert my model as 8bit at FPGA, so the weight tensor scaling factor must be an integer power-of-two value exponent. Is there such an option? what should I do

It seems that the quantization scheme is a little bit different. You can see from this https://github.com/pytorch/pytorch/wiki/torch_quantization_design_proposal

Depending on the fixed-point arithmetic you use, you can convert float multiplier to quantized_multiplier (integer) and right shift (integer). Please checkout https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtils.cc#L107-L157

@sunkr1995
I am facing the same issue. Did you find a way to do that?
Thanks

do you find a way to do that?

do you have a good way to set scale —> 2^n

you’ll need to implement your own fake quantize module: pytorch/fake_quantize.py at master · pytorch/pytorch · GitHub to restrict the scaling factor to power of two, we had an intern recently implemented additive power of two actually: pytorch/fake_quantize.py at master · pytorch/pytorch · GitHub,
the code for using it in the flow can be found in pytorch/apot_fx_graph_mode_qat.py at master · pytorch/pytorch · GitHub
paper: [1909.13144] Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

for converting the model to 8bit FPGA, I think you might need to follow the reference flow, which is only available in fx graph mode quantization right now, please take a look at rfcs/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md at master · pytorch/rfcs · GitHub, you will get a model with q/dq/fp32 ops which represents a quantized model, and you can lower the model to FPGA (I guess you need to expose ops implemented in FPGA in pytorch?) lowering code for native pytorch backend (fbgemm/qnnpack) can be found in pytorch/_lower_to_native_backend.py at master · pytorch/pytorch · GitHub

1 Like