Hi ,@jerryzh168 , The operator QuantizedConv2d’s output is now int8, is there a configuration to force it to float32(e.g. delete the obersver at the output, intuitively).Because if there is just one conv2d layer appears, its quantized version operator QuantizedConv2d will requant the output to int8, then a dequantize layer followed to convert the output to float32, it may decrease the performance. what i want in formulation expression is:
the op itself only has one type of output. the QuantizedConv2d’s output is always quint8 so your options are to not quantize it or to dequantize the quantized output
this is not possible in old flows, but possible in the new flow: (prototype) PyTorch 2.0 Export Post Training Static Quantization — PyTorch Tutorials 2.0.1+cu117 documentation, but we need to onboard more backends before this can be used by modeling users. which backend are you targeting right now?
Thanks for your reply, but i can’t import XNNPACKQuantizer in pytorch distributed version 2.01. The backend i used is qnnpack,with QATconfig(weight_per_tensor_symmetric,input_per_tensor_symmetric,output_float32)
we have 2.1 release coming up soon, could you wait a bit and upgrade to 2.1? you’ll get XNNPACKQuantizer there.