I am trying to run a post training quantized linear layer using PT2 quantization flow. However the final quantized model using convert_pt2e api result in linear matmul op in fp32 precision and quant/dequant ops are placed in succession in the graph.
Eg. The linear layer I am testing is: nn.Linear(5,10)->RELU().
Quantization using XNNPACK quantizer result in following quantized model:
The pattern here is q->dq->linear_op(fp32)
How can we specify quantization to be q->linear_op(lower precision)->dq?