Hi,
I would like to extend QAT to support two below cases. Does anyone know how I can achieve these?
mixed-precision: being able to set precision for each layer separately (manually)
lower precisions: being able to fake-quantize to lower than 8-bit (using a QConfig?)
Thanks!
Raghu is adding support for sub 8 bit qat right now cc @raghuramank100 I think mixed precision is supported as long as you can have sub 8 bit observers, in eager mode quantization you’ll need to set the qconfig manually for each child module.
Thanks @jerryzh168 and @raghuramank100.
Is there a way to have fake 8-bit observers (scale factors and zero point) in current implementation?
you’ll need to implement your own observer module (https://github.com/pytorch/pytorch/blob/master/torch/quantization/observer.py) and fake quantize module(https://github.com/pytorch/pytorch/blob/master/torch/quantization/fake_quantize.py) to support 8-bit scale and zero_point
hi, have you implement your question? Now, I am learning QAT…