Steps/ideas for layer-wise quantization implementation

Hi,
I am interested in implementing layer-wise quantization in glow.
I have looked through the issue just about this topic ([Quantization] layer-wise quantization · Issue #4982 · pytorch/glow · GitHub), but it seems to be left untouched.

My question is, given node list of type glow::Node (or node name list e.g. Conv_conv_1__2) in a model, and each node has its quantization options respectively (for example, these options can be enableChannelwiseOpt, quantizationCalibrationOpt, keepOriginalPrecisionForNodesOpt, …), how do we incorporate these info to the code? Below I will explain a little more backgrounds.

Model profiling or model compiling, whichever it is, glow takes roughly three steps for the model loader:

  1. load the model (loader.loadModel())
  2. get compilation settings (loader.getCompilationContext(QuantizationMode))
  3. compile the model using the CompilationContext config (loader.compile(cctx))

For simplicity, I think the first thing to do is load and add quantization options to all of the nodes in a model, in 2.. To contain these info, we can add a map to precConfig of type PrecisionConfiguration. For example, the map’s key is the node names, and the value is also a map, with the key quantization options and the value the actual preferences. This is done by modifying Loader::getCompilationContext function in Loader.cpp.

Then, modify transformForPrecisionMode function in GraphOptimizer.cpp, as mentioned at the issue link above. This function is called in optimizeFunction in the same file, which is called in Loader::compile function in step 3. (I suppose).

In transformForPrecisionMode, if we conduct profiling, profileQuantization function in GraphOptimizer/Quantization.cpp will be called. Or if we want to execute quantization, quantizeFunction function in quantization/Quantization.cpp will be called.

Either way, I am not sure exactly where to relate the above layer-wise config to these functions. I don’t see modification points in the former case (profiling), because profiling nodes themselves being inserted are not related to configs like calibration method and precision selection, and these are for executing quantization. The latter, however, since the quantizeFunction function creates an FunctionQuantizer instance inside it, I know there are some member functions to be modified in that class. Although I looked at the class, it"s a little complex and I didn’t manage to find where to implement.

Also, glow::lower function in Lower.cpp needs some expansion, since KindSet doNotLowerKinds is included there. What do you think?

To sum up, it is not clear to me that what functions called in/after transformForPrecisionMode should be modified, and how. Additionally if any functions other than transformForPrecisionMode needs to be changed, I would like to know what they are.

Any hints or thoughts?

We actually have https://github.com/pytorch/glow/pull/5145 which is pretty similar and I should land soon. Does it help solve what you’re looking for?

Thank you! I didn’t know that PR.
I looked at that PR and, what it is doing is, as I understand it, basically insert a createConvertTo node into NodeList nodes_ during loading the protobuf and optimizing the graph.

The createConvertTo node traces back to Node::addResult function in Node.cpp. It does not seem to actually convert the type of the tensor in a node into the desired type (like FP16). Where does this converting process undertaken ? In some quantization nodes? Maybe I don’t understand the difference between ConvertToNode and QuantizeNode.

Also, since what I ultimately want to do here is to quantize each node with each (different) quantization preference (e.g. calibration technique, quantization precision, etc.), should we additionally modify quantization::quantizeFunction? I assume the above code you shared is not enough for executing quantization on compilation.

You are correct – that PR needs an extension to include quantization specification. Basically we would want to include some spec in the yaml file there to further specify the dtype and scale/bias to use etc.

Thank you very much for the explanation!