Why is the quantization on the GPU actually not supported?
The CPU quantization works really well and the basic quantization algorithms seem to be mature and on the conceptual level not related to any device. I understand that very large model present new challenges for quantization (outlier features) and I am also exclusively thinking of PTQ.
So, out of genuine curiosity: What makes GPU quantization different from CPU quantization? Why is it difficult to implement?