Quantization of depthwise 1d convolution with QAT is slower than non-quantized

I am a rookie in QAT and try to decrease the latency of a speech recognition model with QAT(from float32 to qint8). After experiment, I find QAT do improve the processing speed of conv1d and conv2d layers a lot. However, when quantize the depthwise 1d convolution layers, I found that this layer is even slower after quantization. Do I use a wrong qconfig or something? Any suggestions for this? Thank you!
Details:
depthwise 1d convolution layer:
(depthwise_conv): Conv1d(512, 512, kernel_size=(31,), stride=(1,), padding=(15,), groups=512)
Qconfig during QAT training(‘x86’):
(activation_post_process): HistogramObserver(min_val=inf, max_val=-inf)
processing time before QAT: 4.7ms
processing time after QAT: 56ms

The performance impact of quantization is dependent on your problem size, hardware, config settings - any of those things could be preventing a speedup on a specific layer.

Thank you for your reply. As all the convolution layers are faster than before, but depthwise layers are even 10 times slower. I am just wondering if this is normal, or I use a wrong config on it. Do you have any suggestion on the qconfig for the depthwise convolution?