How to convert a 32-bit operation to a 4-bit or 8-bit operation on cpu?

learningsteady0J0 · July 3, 2020, 7:51am

To the best of my knowledge, the existing quantization method is operating on 32-bit.
In order to quantize weight of CNN as well as reduce memory footprint and then port the quantized model into the mobile device, how to convert a 32-bit operation to a 4-bit or 8-bit operation on cpu?

Vasiliy_Kuznetsov · July 6, 2020, 6:03pm

PyTorch quantization supports int8 (but not int4), with fast kernels for CPU on mobile via QNNPACK. https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html has some information to get started, and you would want to set the backend to qnnpack to target mobile CPUs.