PyTorch Mobile NNAPI is slower than CPU

I’m following this tutorial to try out NNAPI support in PyTorch Mobile on Pixel 3a (Android 10, Qualcomm Snapdragon 670). I compiled the speed benchmark binary off commit ID 4ed7f36ed

The commands I used are:

adb shell /data/local/tmp/speed_benchmark_torch --pthreadpool_size=1 --model=/data/local/tmp/mobilenetv2-quant_full-nnapi.pt --use_bundled_input=0 --warmup=5 --iter=200
adb shell /data/local/tmp/speed_benchmark_torch --pthreadpool_size=1 --model=/data/local/tmp/mobilenetv2-quant_full-cpu.pt --use_bundled_input=0 --warmup=5 --iter=200

Here are the benchmark numbers I got:

Quant mobilenet v2 using CPU: 66.8 ms
Quant mobilenet v2 using NNAPI: 171 ms

NNAPI not only did not accelerate the model, but slowed it down significantly. I’m really curious what I’m missing here.

Updated my phone to Android 11 and it works now.

1 Like