Int16 precision support on quantization

iviarcio · June 22, 2021, 3:11pm

Hello everyone,
I’m using the quantization-schema=symmetric_with_power2_scale and quantization-precision=Int8 for a new device. Except for a few operators like LRN, this schema works fine. But I’d like to have Int16 precision also. I’ve been studying the “base.h”, “quantization.h” & “quantization.cpp” modules but I’m a little confused on how to progress. Any help or guidance is welcome. Thanks

jfix · June 25, 2021, 7:49pm

Hi @iviarcio, we have an ElemKind::Int16QTy, though I don’t think it’s used too extensively. Have you read through our doc on quantization? Not sure exactly where your confusion lies so not sure what info you need.

iviarcio · June 26, 2021, 1:58pm

Sorry @jfix, but I started studying Glow recently so my questions are pretty basic. What I’ve done so far to support a new device is the following: I created a new backend based on the CPU backend, but removing code, basically from CPULLVMIRGen, and registered it. Then I rewrote transformPostLowering based on the OpenCL backend. I can compile and validate Low-level IR representation for new device requirements in Int8. Example:
…/…/glow/build/bin/image-classifier images/*.png
-image-layout=NCHW -image-mode=0to1
-dump-profile=profile.yaml
-model=resnet50 -model-input-name=“gpu_0/data”
…/…/glow/build/bin/model-compiler -load-profile=profile.yaml
-model=resnet50 -model-input=“gpu_0/data”,float,[1,3,224,224]
-backend=NMP -emit-bundle=bundle
-quantization-schema=symmetric_with_power2_scale
-quantization-precision=Int8 -quantization-precision-bias=Int8
-dump-ir > resnet50.lir
As you can see in the gist resnet50.lir both the NCHW layout and the quantization are ok. But when I try to use Int16 for quantization, the file resnet50-16.lir do not show the expected result. Any help or guidance is welcome. Thanks again, Jordan.

jfix · July 1, 2021, 5:58pm

So it looks like nothing is being quantized. One note here is that the default CPU backend doesn’t have any Int16 quantization support AFAIK. So unless you have added that support, this is probably why you aren’t seeing any int16 in the model – the quantization flow will skip it if Backend::isOpSupported() returns false for it. You will need to add support for those kernels in int16 to libjit, and update your backend’s isOpSupported() to return true for these newly supported ops w/ int16. And there may be other issues in LLVM IR gen, not sure.

iviarcio · July 4, 2021, 7:17pm

I got it. Thanks, @jfix. Now I have Int16 quantization working resnet50-16.lir