Hi ,@jerryzh168 , The operator QuantizedConv2d’s output is now int8, is there a configuration to force it to float32(e.g. delete the obersver at the output, intuitively).Because if there is just one conv2d layer appears, its quantized version operator QuantizedConv2d will requant the output to int8, then a dequantize layer followed to convert the output to float32, it may decrease the performance. what i want in formulation expression is:
the op itself only has one type of output. the QuantizedConv2d’s output is always quint8 so your options are to not quantize it or to dequantize the quantized output
Thanks for your reply, but i can’t import XNNPACKQuantizer in pytorch distributed version 2.01. The backend i used is qnnpack,with QATconfig(weight_per_tensor_symmetric,input_per_tensor_symmetric,output_float32)