How to Quantize CNN into 4-bits?

we do not support 4 bit currently, but contributions are welcome, do you just want to try quantization aware training or do you want to run 4 bit kernels etc.?