Extending Quantization-Aware Training


I would like to extend QAT to support two below cases. Does anyone know how I can achieve these?

  1. mixed-precision: being able to set precision for each layer separately (manually)

  2. lower precisions: being able to fake-quantize to lower than 8-bit (using a QConfig?)


Raghu is adding support for sub 8 bit qat right now cc @raghuramank100
I think mixed precision is supported as long as you can have sub 8 bit observers, in eager mode quantization you’ll need to set the qconfig manually for each child module.

Thanks @jerryzh168 and @raghuramank100.

Is there a way to have fake 8-bit observers (scale factors and zero point) in current implementation?

you’ll need to implement your own observer module (https://github.com/pytorch/pytorch/blob/master/torch/quantization/observer.py) and fake quantize module(https://github.com/pytorch/pytorch/blob/master/torch/quantization/fake_quantize.py) to support 8-bit scale and zero_point

hi, have you implement your question? Now, I am learning QAT…