Compiling pytorch C++ for android

I think Iā€™ve made some progress, but Iā€™m stuck on something and hope someone knows the answer.

The approach I took was taking tools/build_libtorch.py and tools/build_pytorch_libs.sh, and making changes to the cmake arguments with the help of looking at what scripts/build_android.sh was doing. So setting ANDROID_NDK, CMAKE_TOOLCHAIN_FILE and so on.

There were some errors that I managed to get around by setting -DCMAKE_CROSSCOMPILING=1, -DRUN_HAVE_STD_REGEX=0, -DANDROID_STL="c++_static" and LDFLAGS="-llog".

But now Iā€™m getting errors related to protobuf. I noticed a warning

WARNING: Target "libprotoc" has EXCLUDE_FROM_ALL set and will not be built by default but an install rule has been provided for it.  CMake does not define behavior for this case.
WARNING: Target "protoc" has EXCLUDE_FROM_ALL set and will not be built by default but an install rule has been provided for it.  CMake does not define behavior for this case.

And the (I think) relevant code has a comment saying that pending a MR the behaviour could be changed. And the MR has been merged. But Iā€™m not knowledgeable enough to know how to change the codeā€¦ edit: found the issue

The errors Iā€™m still getting are

[...]/t/pytorch-cpu/third_party/onnx/onnx/onnx_onnx_torch.pb.h:38:3: error: expected expression
  static const ::google::protobuf::internal::ParseTableField entries[];
  ^
[...]/pytorch-cpu/third_party/onnx/onnx/onnx_onnx_torch.pb.h:37:17: error: variable has incomplete type 'struct ONNX_API'
struct ONNX_API TableStruct {
                ^
[...]/pytorch-cpu/third_party/onnx/onnx/onnx_onnx_torch.pb.h:37:8: note: forward declaration of 'protobuf_onnx_2fonnx_5fonnx_5ftorch_2eproto::ONNX_API'
struct ONNX_API TableStruct {
       ^
[...]/pytorch-cpu/third_party/onnx/onnx/onnx_onnx_torch.pb.h:45:6: error: variable has incomplete type 'void'
void ONNX_API AddDescriptors();
     ^
[...]/pytorch-cpu/third_party/onnx/onnx/onnx_onnx_torch.pb.h:45:14: error: expected ';' after top level declarator
void ONNX_API AddDescriptors();
             ^
             ;

Hi! I am very interested on PyTorch compilation for Android. Did you have any further progress?

Thanks!

Nope. None at all. Caffe2 didnā€™t work for me as well (I suspect itā€™s because Iā€™m using RNNs). I switched to Tensorflow (to my great displeasure).

1 Like

Argh I see, I am trying to make the Caffe2 option work with onnx but no luck so far. Thanks!

I wrote a bit about how to get Caffe2 (the AICamera example) to work on android. As you can tell from the other bits of the blog post, I think that libtorch would be neat to have, too, and while I have a quick and dirty ā€œfeasibility studyā€ port, I think such a project would need commercial backing. If that works out, we could have libtorch on android by the end of the year.

Best regards

Thomas

2 Likes

@tom

Thanks for replying to the message I sent (program hangs when calling new caffe2::Predictor, not something you had experienced for anyone else). Iā€™m also curious what commit of pytorch and what version of the ndk you used (15/16/17?) if you wouldnā€™t mind sharing?

I used some (arbitrary) master version plus, as described in the blog post, a few tweaks. I used ndk v18 as that was the one AndroidStudio happened to download. I did adapt the predictor calling in my fork of AICamera, but once it compiled, I didnā€™t observe hanging.

Best regards

Thomas

1 Like

Seems there are a lot of people 1,2,3 with this problem, but of course no responses.

But does the master version of caffe2 work for you? I used that and it went reasonably well. (I must admit I havenā€™t spent much time with caffe2/android after I found libtorch/android to be feasible because I think that is a future I prefer. In a week we know a bit more about how soon thatā€™s going to happen.)

Best regards

Thomas

No it does not. In fact I had to make some modifications to the caffe2 source so it would even compile (had error no member named 'set_device_id' in 'caffe2::DeviceOption'). Iā€™m guessing you havenā€™t actually tried to use any of your own models, Iā€™ve found comments talking about how even when the demo worked using other models does not work. This twitter thread was quite revealing.

Hm. No, I didnā€™t add my own model and it would be quite a bummer if that would not work. Is the model youā€™re trying to use available somewhere? Then I could drop it into my version.
I didnā€™t have to change the caffe2 source, so that could probably be something where things went wrong for you. How did you try to compile? I put my procedure in the README.md of my AICamera fork.
That said, I cannot understand the outrage from the difference between having bias in the convolution layer vs. adding it separately. Yes, performance wants fused kernels, but hey, for a toy appā€¦
But thanks for the advertisement for how doing this with only PyTorch is much easier. :wink:

Best regards

Thomas

Okay. Iā€™m first going to talk getting your app to work. Then trying with mine. Iā€™m using ndk18 now by the way.

I compile using git clone --recursive and ANDROID_NDK=<path/to/Android/Sdk/ndk-bundle> scripts/build_android.sh -DCTOOLCHAIN=clang.

Using pytorch from your github link leads to the problem that it tries to get eigen from 'https://github.com/RLovelett/eigen.git which doesnā€™t exist. I fix it by getting eigen from the current remote repo. Then when compiling an error: call to 'stod' is ambiguous, of course there is a github issue on this with no response. When finished, not all requested libraries have been built, and I realize that for some reason pytorch/c10 doesnā€™t exist. git submodule update does nothing, so I give up and try and just use the latest pytorch.

So I get that and compile it (and it worked! amazing). Obviously I change the paths to the static libraries to where I have them. The first build attempt fails, I need to add libqnnpack. I am by the way using the header files youā€™ve included (so potentially different). I get an error:

../../../../src/main/cpp/ATen/core/TensorImpl.h:755: error: undefined reference to 'caffe2::GetAllocator(at::DeviceType const&)'
../../../../src/main/cpp/ATen/core/TensorImpl.h:763: error: undefined reference to 'at::PlacementDeleteContext::makeDataPtr(at::DataPtr&&, void (*)(void*, unsigned int), unsigned int, at::Device)'

So I stop using your headers and instead include my own:

include_directories(git/pytorch-android git/pytorch-android/aten/src
   git/pytorch-android/torch/lib/include git/pytorch-android/build_android
    git/pytorch-android/build_host_protoc/include git/pytorch/build/aten/src/)

Note the git/pytorch/build/aten/src/, I had to link to the build of pytorch I had for local use because running scripts/build_android.sh doesnā€™t actually create all the required ATen header files.

And after that it worked!

Now about my app.
Because of a ā€˜multiple definitionsā€™ error with another library (openfst) I compile normally and then with -DBUILD_SHARED_LIBS=YES (after renaming the build_android dir of course), which doesnā€™t complete, but gets far enough to give me a libc10.so which I use to avoid the error (Iā€™m mentioning this just to give a fuller picture of what Iā€™m doing).

Then the app runs but canā€™t instantiate the Predictor (fails on new caffe2::Predictor(init_net, predict_net)). I donā€™t get any error message. I used onnx_graph_to_caffe2_net to export the model, mobile_exporter doesnā€™t like having Long inputs it seems, and if I use a model that has floats I get the same error in the end anyways.

My model is not related to the camera thing at all, but itā€™s quite small (6MB) and simple to use. See this gist for how I run it (using pytorch built for my PC, just to check whether caffe2 works locally). Model is here.
EDIT: But the model is irrelevant I think. Even just a single fully connected layer as a model results in the same problem.

Iā€™d say the ā€˜outrageā€™ is just regarding the fact that there are a lot of people having problems but noone seems to get any responses. Even just a

I didnā€™t have to change the caffe2 source, so that could probably be something where things went wrong for you.

is very helpful (so even if you just say ā€œthis seems like you did something wrongā€). But most people arenā€™t getting anything at all.

Before I start getting into technical things, what are you trying to achieve?
If you just want to vent, I donā€™t have to judge that, but Iā€™d rather not engage with it. Iā€™m just a random guy on the internet trying to be help out other people every once in a while, and to be honest Iā€™m asking myself why sometimes.

Now, the eigen repository changed URL quite a while ago (seams not unreasonable to me) and for some reason you picked up the old one (maybe because I donā€™t use the master branch in my github repo and it is old enough to still have that - but I donā€™t know and git submodule apparently does not handle changes in the upstream repository - probably because no-one who hit the problem showed up to fix it, I certainly am guilty of that). Similarly, I could point out that weā€™re all doing git submodule update --init --recursive or so these days.

Probably itā€™s fair to say that some of the things that are happening in the following are from mixing different versions and builds. Including the headers seems like a not so stable recipe when people might use different versions of the lib, and I must admit I am hoping that someone else finds out about how to get all the headers in one place nicely and tells me.

I am not quite able to discern what you did from your description. Itā€™s no wonder that typical bug report templates ask people to provide as exact instructions to reproduce. The issues you are seeing likely stem from the complexity of the interactions of various bits so that apparently the instructions such as the ones I put out donā€™t allow you to replicate what I did. But anyone trying to look into your issue will have the exact same problem, so it would seem a natural reaction to say yeah well, Iā€™ll answer someone who provided more detail instead. It might be nicer to drop a note of that sort into the bug report, but Iā€™ll tell you from experience, that itā€™s as frustrating for the people trying to find out what went wrong as for the reporter, so itā€™s easy for incomplete bug reports to fall through the grid.

To me, your report sounds like I started with what you said and then I did all sorts of crazy things and then it turned out wrong and now Iā€™m angry and by the time I could have looked at more details I have zoomed out and walked away.

Good luck with your project!

Best regards

Thomas

My goal was to get to the same point others had, being able to build the AICamera demo but not use my own model. Because doing so was not trivial, I wrote down what I had to do to get to that point. I can see now that the way I wrote it, it may have come across as confrontational. Sorry about that. Thank you for taking the time to write the posts that you did. Hope Iā€™m not the reason you stop helping random strangers on the internet. :slight_smile:

Hey, no worries.

Based on your feedback, I updated my master branch and Iā€™ll suggest to do

git clone --recursive -b caffe2_android https://github.com/t-vi/pytorch.git

to get the repo.
Iā€™ll try to build from that and see if I also hit either the stod or c10 thing.

For the model use:
Do I understand correctly, that the demo model works for you (but your own doesnā€™t)?
Does the Predictor instantiation fail on an x86 android virtual device as well?
Does it raise an exception when in debug mode?
I sometimes found error messages in logcat or have enabled catching native exceptions (in the ā€œedit all breakpointsā€ in android, but be sure to set a breakpoint on the invocation youā€™re interested in and do ā€œdisable until breakpoint is hitā€ or so to not drown in exceptions during setup) then one can try to inspect the exception message in the local variables or so.

Best regards

Thomas

Yes, the demo model worked, but my own did not.

I appreciate you asking more clarifying questions, but Iā€™ve moved to tensorflow (I said the same thing a week ago, I came back to try pytorch/caffe2 on android without RNNs but obviously my attempt with that failed as well) and do not plan on sinking any more time into trying to solve this. Thanks again for the time you put in trying to help me anyways!

Thanks!

Yes, I can understand that youā€™d not want to invest more into something you donā€™t intend to use.

Best of luck, I hope PyTorch getā€™s the chance to provide you with a better experience next year.

Best regards

Thomas

2 Likes

Hello @divinho , @tom is there any update on this issue ?

I have exported a custom CNN-LSTM model from Pytorch (v1.1) using the approach: pytorchā€“>ONNXā€“> caffe2. I have used ndk 19 and ONNX 1.3 and started from the official tutorial GitHub - facebookarchive/AICamera: Demonstration of using Caffe2 inside an Android application..

I am kind of stuck with loading RNN layers in caffe2 similar to the above problem that has been referenced here.

I would like to know if there is any progress either from caffe2 or from directly working with c++ front-end (LibTorch) for android using RNNs ?

Thanks in advance and thank you for your time.

black0017.

I just asked about this here but received no reply. :frowning:

1 Like

Cross-posting from the other thread:

Basic PyTorch / Android support has been merged.
There currently is some hiccup due to the merging of libcaffe2 and libtorch #20774, but it seems to work.

Best regards

Thomas

Not all models are fast yet, but this one has custom ops, too:

2 Likes