Thanks. Apart from the layernorm problem, I found another problem. Even my network is very simple, for example, just one Linear layer without LayerNorm, the cpu usage is very high after quantization. More details can be found in in post.
This problem has been confused me for a long time.