Hi Team,
I am using transfer learning on a pre-trained quantized renset50 model (trained via QAT) and configurations set for qnnpack(to test on android via adb) and also x86(to test on PC).
I thought only qnnpack model can be run on android but x86 version is also running fine and that to with same inference time
QNNPACK
adb shell /data/local/tmp/pt/speed_benchmark_torch --model /data/local/tmp/pt/quantized_resnet_qnnpack.pt --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter 20
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Microseconds per iter: 85391.2. Iters per second: 11.7108
with x86 configs
adb shell /data/local/tmp/pt/speed_benchmark_torch --model /data/local/tmp/pt/quantized_resnet.pt --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter 20
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Microseconds per iter: 85238.3. Iters per second: 11.7318
Now I wonder what is the purpose of qnnpack then?
Also I had to do some architectural changes in my model by changing the place of dequant in case of qnnpack because after quantization.convert the model got error:
terminating with uncaught exception of type c10::NotImplementedError:
Could not run 'quantized::linear' with arguments from the 'CPU' backend.
original model → quant-> L1 ->L2->L3…dequant->linear(new classifier head)
Update for qnnpack quant-> L1 ->L2->L3… linear(new classifier head) → dequant
used ref: (beta) Quantized Transfer Learning for Computer Vision Tutorial — PyTorch Tutorials 2.0.1+cu117 documentation.
Note:
- Configuration while training
for x86 while training I used
model[0].qconfig = torch.quantization.default_qat_qconfig
while for qnnpack I used
model[0].qconfig = torch.ao.quantization.get_default_qat_qconfig('qnnpack')
- Running adb on a tensor g2 processor
Also I want to run the model on RPi but don’t know which model to go for