About build_android.sh, LITE and NNAPI

We are currently working with pytorch 1.9.1.
We were successful in using build_android.sh (non lite interpreter) with the operator list and creating static libraries. Linking them with -Wl,--start-group;-Wl,--whole-archive;${libtorch_LIBRARIES};-Wl,--no-whole-archive;-Wl,--end-group allowed to solve all symbols, and we are able to run NNAPI models.
The first issue I would like to report (Specifying USE_VULKAN=0 in launching build_android.sh does not disable Vulkan · Issue #66196 · pytorch/pytorch · GitHub) is that if I specify USE_VULKAN=0 while launching the build_android.sh script Vulkan is not effectively disabled and I get an error

CMake Error at cmake/VulkanCodegen.cmake:72 (message):
  Failed to gen spv.h and spv.cpp with precompiled shaders for Vulkan backend
Call Stack (most recent call first):
  caffe2/CMakeLists.txt:6 (include)

I was only able to get through this by manually editing the build_android.sh script.

The second problem is that I specify BUILD_CAFFE2_MOBILE=1 in the script, I get an error towards the end of the build process

[ 93%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/operators/rsqrt_op.cc.o
In file included from /home/claudio/pytorch/pytorch/caffe2/operators/rms_norm_op.cc:9:
In file included from /home/claudio/pytorch/pytorch/aten/src/ATen/Parallel.h:3:
/home/claudio/pytorch/pytorch/aten/src/ATen/core/ivalue.h:3:10: fatal error: 'ATen/core/TensorBody.h' file not found
#include <ATen/core/TensorBody.h>
         ^~~~~~~~~~~~~~~~~~~~~~~~

with TensorBody.h for some reason not properly generated.

Third, if I then try to build and use the light interpreter (giving up BUILD_CAFFE2_MOBILE=1) when I load the lite NNAPI model I get the same error reported here (they are providing an example model) PyTorch NNAPI support · Issue #644 · lutzroeder/netron · GitHub

2021-10-06 11:55:48.715 13202-13202<OMITTED> E/libc++abi: terminating with uncaught exception of type c10::Error: The implementation of class __torch__.torch.classes._nnapi.Compilation cannot be found. ()
    Exception raised from parseMethods at <OMITTED>pytorch/pytorch/torch/csrc/jit/mobile/import.cpp:426 (most recent call first):
    (no backtrace available)
2021-10-06 11:55:48.795 13202-13202/<OMITTED> A/libc: Fatal signal 6 (SIGABRT), code -6 (SI_TKILL) in tid 13202 (<OMITTED>), pid 13202 (<OMITTED>)

I launched build_android.sh with the following command line:
USE_VULKAN=0 ANDROID_DEBUG_SYMBOLS=0 ANDROID_ABI=arm64-v8a ANDROID_NDK=<path_to_ndk> BUILD_LITE_INTERPRETER=1 SELECTED_OP_LIST=<path_to_op_list> scripts/build_android.sh
I verified in the cmake cache that USE_NNAPI is propertly set to 1

cc @IvanKobzarev on the Vulkan issue.
BUILD_CAFFE2_MOBILE is for legacy caffe2 build (maybe we should consider deprecating it?). BUILD_LITE_INTERPRETER=1 is suggested for building mobile.

NNAPI is in Prototype stage, (Prototype) Convert MobileNetV2 to NNAPI — PyTorch Tutorials 1.9.1+cu102 documentation. @David_Reiss does it work with 1.9.1?

Thanks Mengtao, indeed once I gave up BUILD_CAFFE2_MOBILE I did use BUILD_LITE_INTERPRETER=1, let’s see what David says, this _nnapi.Compilation runtime error is really preventing us from using the lite interpreter with NNAPI

I believe it is not enabled in the prebuilt binaries for 1.9 because it was classified as prototype. Am I understanding correctly that you’re able to build from source and run NNAPI with the full JIT, but doing the same with the lite interpreter build is giving you a runtime error “_nnapi.Compilation cannot be found”? Is the only difference between the working and non-working configuration setting “BUILD_LITE_INTERPRETER=1”?

Thanks David. The answer is yes to all your questions. I can also run the lite cpu version of the model with lite enabled, the only lite model not working is the NNAPI one (but it’s also the fastest :slight_smile: ), and the non lite NNAPI with non lite pytorch also works, and run fast :slight_smile:

To be precise, in one case I will try to run the NNAPI model deployed not lite in an interpreter built with BUILD_LITE_INTERPRETER=0 (successful), and in the other case the NNAPI model deployed lite in an interpreter built with BUILD_LITE_INTERPRETER=1 (failure).
I cannot run he NNAPI deployed lite model in an interpreter built with BUILD_LITE_INTERPRETER=0 because that doesn’t provide torch::jit::_load_for_mobile (the app fails to build at link time)

Note that my colleague Aurelien reported the same _nnapi.Compilation error for the distributed version of pytorch lite Linking errors with Pytorch Lite 1.9.0 in Android native app - #3 by Aurelien. I am not sure the fact that NNAPI was prototype is the reason it didn’t make it into the lite version of the library, as it is there in the non lite version.

So I dug a bit into it, it seems to me that the problem is in this section of aten/src/ATen/CMakeLists.txt, which by the way hasn’t evolved between 1.9.1 and current master:

if(BUILD_LITE_INTERPRETER)
  set(all_cpu_cpp ${generated_cpp} ${core_generated_cpp} ${cpu_kernel_cpp})
  append_filelist("jit_core_sources" all_cpu_cpp)
  append_filelist("aten_cpu_source_non_codegen_list" all_cpu_cpp)
  append_filelist("aten_native_source_non_codegen_list" all_cpu_cpp)
  list(APPEND all_cpu_cpp ${Aten_TH_AVX_extra_src})
else()
  set(
    all_cpu_cpp ${base_cpp} ${ATen_CORE_SRCS} ${native_cpp}
    ${native_ao_sparse_cpp} ${native_sparse_cpp}
    ${native_quantized_cpp} ${native_mkl_cpp} ${native_mkldnn_cpp}
    ${native_utils_cpp} ${native_xnnpack} ${generated_cpp} ${core_generated_cpp}
    ${ATen_CPU_SRCS} ${ATen_QUANTIZED_SRCS} ${ATen_NNAPI_SRCS} ${cpu_kernel_cpp}
  )
endif()

It seems that ATen_NNAPI_SRCS files are not included in the build when building the light interpreter. (They are actually built, I can find the .o files, but not linked into the output library)

We tried rebuild pytorch with this patch

diff --git a/aten/src/ATen/CMakeLists.txt b/aten/src/ATen/CMakeLists.txt
index baf9666f11..19f9a78443 100644
--- a/aten/src/ATen/CMakeLists.txt
+++ b/aten/src/ATen/CMakeLists.txt
@@ -130,7 +130,7 @@ add_subdirectory(quantized)
 add_subdirectory(nnapi)
 
 if(BUILD_LITE_INTERPRETER)
-  set(all_cpu_cpp ${generated_cpp} ${core_generated_cpp} ${cpu_kernel_cpp})
+  set(all_cpu_cpp ${generated_cpp} ${core_generated_cpp} ${ATen_NNAPI_SRCS} ${cpu_kernel_cpp})
   append_filelist("jit_core_sources" all_cpu_cpp)
   append_filelist("aten_cpu_source_non_codegen_list" all_cpu_cpp)
   append_filelist("aten_native_source_non_codegen_list" all_cpu_cpp)
diff --git a/scripts/build_android.sh b/scripts/build_android.sh
index daad46e8fb..211f5bb429 100755
--- a/scripts/build_android.sh
+++ b/scripts/build_android.sh
@@ -147,7 +147,11 @@ if [ "${ANDROID_DEBUG_SYMBOLS:-}" == '1' ]; then
 fi
 
 if [ -n "${USE_VULKAN}" ]; then
-  CMAKE_ARGS+=("-DUSE_VULKAN=ON")
+  CMAKE_ARGS+=("-DUSE_VULKAN=${USE_VULKAN}")
+fi
+
+if [ -n "${USE_NNAPI}" ]; then
+  CMAKE_ARGS+=("-DUSE_NNAPI=${USE_NNAPI}")
 fi
 
 # Use-specified CMake arguments go last to allow overridding defaults


diff --git a/aten/src/ATen/CMakeLists.txt b/aten/src/ATen/CMakeLists.txt
index baf9666f11..19f9a78443 100644
--- a/aten/src/ATen/CMakeLists.txt
+++ b/aten/src/ATen/CMakeLists.txt
@@ -130,7 +130,7 @@ add_subdirectory(quantized)
 add_subdirectory(nnapi)
 
 if(BUILD_LITE_INTERPRETER)
-  set(all_cpu_cpp ${generated_cpp} ${core_generated_cpp} ${cpu_kernel_cpp})
+  set(all_cpu_cpp ${generated_cpp} ${core_generated_cpp} ${ATen_NNAPI_SRCS} ${cpu_kernel_cpp})
   append_filelist("jit_core_sources" all_cpu_cpp)
   append_filelist("aten_cpu_source_non_codegen_list" all_cpu_cpp)
   append_filelist("aten_native_source_non_codegen_list" all_cpu_cpp)
diff --git a/scripts/build_android.sh b/scripts/build_android.sh
index 4dcb00becb..211f5bb429 100755
--- a/scripts/build_android.sh
+++ b/scripts/build_android.sh
@@ -150,6 +150,10 @@ if [ -n "${USE_VULKAN}" ]; then
   CMAKE_ARGS+=("-DUSE_VULKAN=${USE_VULKAN}")
 fi
 
+if [ -n "${USE_NNAPI}" ]; then
+  CMAKE_ARGS+=("-DUSE_NNAPI=${USE_NNAPI}")
+fi
+
 # Use-specified CMake arguments go last to allow overridding defaults
 CMAKE_ARGS+=($@)

And it works!!!. It requires launching build_android.sh with extra arguments USE_VULKAN=OFF USE_NNAPI=ON (unless one wants to activate Vulkan).

Note that ATen_NNAPI_SRCS is only defined when USE_NNAPI is on, so it makes sense to add the content of ATen_NNAPI_SRCS also in the lite scenario, because it will only increase the size of the resulting library if one explicitly requires NNAPI.

@iseeyuan, do you think we should enable NNAPI by default on Lite interpreter builds for Android? It’s pretty small.

@David_Reiss I didn’t realize that nnapi was built in full-jit only. It makes sense to me to do the same for lite interpreter.

I think eventually it would be nice to have nnapi through the unified API (the same we integrated coreML) but we could discuss it offline. cc @raziel

Thanks for the great finding, @caraffi !

1 Like