Pytorch Quantization Inference data type

I am new to pytorch quantization, confused with two points :joy:
1、pytorch two quantization ways: qat and ptq. Are they use the same operations in framework during inference phase when deploy on mobile device? will they have the same performance?
2、Why activations are quantized to uint8 meanwhile weights quantized to int8. Because int8 weights doesn’t need to subtract the zero-point? :sweat_smile:

  1. yes the operators used during inference for both QAT and PTQ flows will remain the same. On mobile some of the kernels use QNNPACK for inference so the performance may differ on mobile compared to server.
  2. This is due to the requirement of underlying kernels that perform the GEMM operation, i.e. FBGEMM and QNNPACK.

About the point 2, what’s the data type for acivations and weights in QNNPACK. The code shows that activations are uint8 but weights are void*. :joy: