Why qnnpack configurations?

Hi Team,

I am using transfer learning on a pre-trained quantized renset50 model (trained via QAT) and configurations set for qnnpack(to test on android via adb) and also x86(to test on PC).

I thought only qnnpack model can be run on android but x86 version is also running fine and that to with same inference time
QNNPACK

adb shell  /data/local/tmp/pt/speed_benchmark_torch --model  /data/local/tmp/pt/quantized_resnet_qnnpack.pt --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter 20
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Microseconds per iter: 85391.2. Iters per second: 11.7108

with x86 configs

adb shell  /data/local/tmp/pt/speed_benchmark_torch --model  /data/local/tmp/pt/quantized_resnet.pt --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter 20
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Microseconds per iter: 85238.3. Iters per second: 11.7318

Now I wonder what is the purpose of qnnpack then?
Also I had to do some architectural changes in my model by changing the place of dequant in case of qnnpack because after quantization.convert the model got error:

terminating with uncaught exception of type c10::NotImplementedError: 
Could not run 'quantized::linear' with arguments from the 'CPU' backend.

original model → quant-> L1 ->L2->L3…dequant->linear(new classifier head)
Update for qnnpack quant-> L1 ->L2->L3… linear(new classifier head) → dequant
used ref: (beta) Quantized Transfer Learning for Computer Vision Tutorial — PyTorch Tutorials 2.0.1+cu117 documentation.

Note:

  1. Configuration while training
    for x86 while training I used
model[0].qconfig = torch.quantization.default_qat_qconfig

while for qnnpack I used

model[0].qconfig = torch.ao.quantization.get_default_qat_qconfig('qnnpack')
  1. Running adb on a tensor g2 processor

Also I want to run the model on RPi but don’t know which model to go for

QNNPACK aims to provide better performance when running a quantized model w/ QNNPACK backend. It has high performance kernels for both Arm as well as x86 CPUs.

sorry I thought there is a question about how to resolve the exception, looks like that’s not the main issue

Can only the qnnpack configured model run on RPi?
Because I am getting the error

RuntimeError: could not create a primitive descriptor for a reorder primitive

cc @digantdesai @jerryzh168