Hi, I believe -quantization-precision only supports Int8 and Int16 right now. For 4-bit quantization, it’s currently only supported by a few ops, such as EmbeddingBag and SparseLengthsSum given that these ops often are loading from extremely large embedding tables that can be shrunk significantly using 4-bit quantization.
If you wanted to use it for other operators we’d need to expand its support across a variety of different operators.
Thank you for the information. I wanted to try 4-bit quantization if possible to compare the accuracy and performance of models generated by Glow.
For the case of when we want to do 4-bit quantization for some operators like EmbeddingBag and SparseLengthSum, how can we enable that? -quantization-precision does not support the Int4 for now.
So right now I don’t think we have automatic Glow-based quantization support for this. We have pre-quantized Glow kernels for executing these ops, but they are only ever loaded pre-quantized from the input model, i.e. they are quantized in PyTorch, Caffe2, etc. before Glow ever loads them.
In order to support this we’d need to extend the Glow profiler to (1) support per-row profiling, and then (2) use that per-row profiling to do the 4-bit quantization.
(Note that our 4bit quantization support for EmbeddingBag and SparseLengthsSum are both rowwise quantized)