Quantization is often tackled as a rewrite of the original model. We can overload Convolution (i.e, the module convolution), we can add quantization layer before and after (Glow style plus above), but if we use the convolution as functional we may want to add different quantization for the different slots (input, weights, and bias).
When I talk to HW people, they would like to break a convolution into a correlation and a bias addition. Thus, reorganize the convolution operation into two distinct operations. Quantization can be different for the weights, the correlation, the bias and their sums.
Then Quantization affects the forward computation. It affects the backward and the range of the quantization can be used as parameter and the gradient computation could/should use it for training.
Quantization as pass
In this forum, there is a nice tutorial how to introduce an optimization pass. This pass uses CustomFuseGraph. The Good: it boils down to an import. The Bad: FuseGraph optimizations are based on same input same output operations (convolution does not belong here)[ PLEASE CORRECT ME]. This pass will change the forward computation, thus the pass should be done before any AUTOGRAD. With this example, we do not have much control when the optimization and it seems too late.
Trainable and Automatic quantization for TF/Caffe
What automatic tools do in TF, Caffe, they modify the computation graph by pattern recognition and HW requirements, they train the network, then they remove those layers for inference. After that a dedicated compiler will take the computation graph and write code for a specific architecture.
Quantization as jit pass
The way I see it, it will be nice to register a jit pass. This pass must be before gradient computation. This pass will be basically an IR graph manipulation where a few targeted operation will be at first a sub graph but the inputs are completely qualified so that the “rewrite” of the graph can be local, complete and without side effects (nice functional).
Question for the masters
Would you like to let me know how to get started in a practical way ?
Please, hit me with questions, pointer, github links … whatever you consider important.