Quantized model is slow and gpu usage becomes high with qnnpack

Sining_Sun · November 4, 2020, 3:31am

Thanks. Apart from the layernorm problem, I found another problem. Even my network is very simple, for example, just one Linear layer without LayerNorm, the cpu usage is very high after quantization. More details can be found in in post.

This problem has been confused me for a long time.