Would love the feedback!
So I gave it a shot for android but immediately entered a world of pain that probably is entirely unrelated to PyTorch but more with Android and my lack of Android experience.
I should admit that my experience with Android is very limited (I adapted the Caffe2 AI Camera demo app to use PyTorch with the ResNet, MarkRCNN, and Style Transfer about a year ago and ported PyTorch to Android for it, but that is about it).
Some of these comments might eventually lead to useful additions to the tutorial, but I donât think Iâm there yet.
The tutorial says:
We recommend you to open this project in Android Studio, in that case you will be able to install Android NDK and Android SDK using Android Studio UI.
This looks foolproof, but it seems not quite Thomas-poof:
- It seems that you absolutely need to use the latest Android studio (mine was, Iâm guessing, about 10 months old) - if you donât have it, you get the most undescriptive error messages for an internet search reveals meant âupgrade to latest android studioâ three versions ago, too,
- Maybe one could say that one shoud Import the (gradle) project rather than opening it. Opening did nothing for me.
Then I tried the Hello World app:
- Didnât seem to work on the Android Emulator - I just got a white screen.
- Works on actual hardware (I get a Wolf or dog classified as such).
Happily, the PyTorch demo app worked better - it worked on both.
On my phone (BQ Aquaris U Plus, so not a high-end phone), I get ~3.5x images / second for the quantized resnet. This left me wondering what I should expect. In particular, I have been wondering whether the PyTorch build Iâm using is a debug or a release build - I realize that Android release builds come with signing and stuff, so it would be a debug build from the android studio side, but that doesnât necessarily mean anything for the PyTorch library itself. I am guessing that it is a non-debug build of PyTorch â with the assumption that most people donât need that even when they are debugging their app.
It is neat so see this work out and I admire that you furnished a text classification example, too. It would be supercool if you could add the model construction / serialization for the quantized resnet and text app, too (the HelloWorld app has the (trivially) simple tracing of resnet 18, but I didnât see them for the PyTorch demo app).
All in all, it looks good! Thank you.
Best regards
Thomas
Thanks for the feedback! Weâll get the docs updated and figure out why HelloWorld isnât working on the emulator.
The ResNet model is not quantized and expected to be fairly slow. We might just delete it from the demo app.
The quantized MobileNetV2 model is based on this tutorial: https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html . The text model was prepared by https://nbviewer.jupyter.org/gist/dreiss/ee4ff1ed2e137326d13e96bb4f953061 . The weights came from a trained PyText model. Weâll look into getting these links included in the app or docs.
Thank you, David!
Besteht regards
Thomas
I trained a custom resnet18 and mobilenetv2 with 3 classes output. However they all failed when I substituted the âresnet18.ptâ from the example with my own models.
findViewById(R.id.vision_card_resnet_click_area).setOnClickListener(v -> {
final Intent intent = new Intent(VisionListActivity.this, ImageClassificationActivity.class);
intent.putExtra(ImageClassificationActivity.INTENT_MODULE_ASSET_NAME, "resnet18-custom.pt");
intent.putExtra(ImageClassificationActivity.INTENT_INFO_VIEW_TYPE,
InfoViewFactory.INFO_VIEW_TYPE_IMAGE_CLASSIFICATION_RESNET);
Relevant error logs:
E/PyTorchDemo: Error during image analysis
com.facebook.jni.CppException: false CHECK FAILED at aten/src/ATen/Functions.h (empty at aten/src/ATen/Functions.h:3535)
(no backtrace available)
at org.pytorch.Module$NativePeer.initHybrid(Native Method)
at org.pytorch.Module$NativePeer.<init>(Module.java:70)
at org.pytorch.Module.<init>(Module.java:25)
at org.pytorch.Module.load(Module.java:21)
at org.pytorch.demo.vision.ImageClassificationActivity.analyzeImage(ImageClassificationActivity.java:167)
at org.pytorch.demo.vision.ImageClassificationActivity.analyzeImage(ImageClassificationActivity.java:31)
Is there anything else I need to do to change to use my custom model?
Thank you.
Yeah, well.
- There is no C++ traceback, so itâs hard to tell what went wrong. My guess from the traceback is that it is in the loading of the model rather than executing.
- Are you sure you put a TorchScript (most probably traced) model at the right place?
The HelloWorld app has a .py for exporting the model.
Best regards
Thomas
Thanks for your response. I did put in a traced script module though. And itâs indeed during the model loading time.
Hi @lkhphuc @tom. A common cause for this is converting the model with a version of Pytorch less than 1.3.0. As of present, only models converted with the latest pytorch version would load.
See my latest tutorial on Pytorch Mobile including an easy to reuse open source image recognition example.
This provides a useful guide
https://heartbeat.fritz.ai/pytorch-mobile-image-classification-on-android-5c0cfb774c5b
My model was jit.trace and save yesterday on Google Colab. I checked and the pytorch version is 1.3.0+cu100. I think it was the latest already.
If the Pytorch version is fine. This error also occurs when the model path is not found. Since the Module class accepts an absolute path to the model file. I suggest you confirm that the path to the model is a valid file.
Are you able to share this model? If so, we can debug and improve this error message.
@johnolafenwa Even if I changed my model name to âresnet18.ptâ to replace the current one in the Android repo, itâs still has problem.
@David_Reiss You can find the model here: https://drive.google.com/file/d/1fAhLpkoqNR32KQtzEkWe0LhBfAYb7Eiu/view?usp=sharing
Thank you.
Thanks for sharing the model. We are following up the issue at: https://github.com/pytorch/pytorch/issues/28379
So I have played with it some more:
- I must say I dislike the hack around reading models (copying from assets to fs). I got burnt by forgetting to overwrite it more than once. My libtorch/JNI adaptation used
torch::jit::load
with string streams. This is less memory efficient, of course, but for development it is much more convenient. If it isnât controversial, I could see if I file a PR. - One of the first things I needed was a method to get output pictures, Iâll file a PR for adding something like that to torchvision.
- I hit a bug with some arm32 ops. Model works fine on Android/x86, colors/scanline widths are off on Android/arm(32). The net uses
aten::_convolution, aten::add, aten::contiguous, aten::instance_norm, aten::reflection_pad2d, aten::relu_, aten::tanh, prim::Constant, prim::GetAttr, prim::ListConstruct
with the most fancy convolutions being with stride=2 and a transposed one with stride=2 and output padding=1 (so no non-default dilations, kernel 1,3,7). Iâm still trying to narrow this down. As error looks like it is some âstridingâ/contiguous problem, I tried inserting lots ofcontiguous
but it didnât immediately help. I know this isnât narrowed down enough yet, unfortunately.
Best regards
Thomas
cc @David_Reiss and @ljk53
I must say I dislike the hack around reading models (copying from assets to fs).
We expect most production apps to download models at runtime, so reading from the filesystem should be more natural in that case. However, youâre right that it feels hacky when youâre first developing with the model as an asset. We would accept a PR that added reading the model directly from the APK.
I hit a bug with some arm32 ops.
Please let us know what you find. Or, if you are comfortable sharing the model, we can debug as well.
I tried out the hello-world app and could not get it to work on an emulator or actual device (Huawei P20 Lite). However, the demo app works perfectly.
I tried setting up a test with my own model (for person re-identification) using a similar setup as on the demo app and I get an error as soon as I load my model:
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.example.pytorchmobile/com.example.pytorchmobile.MainActivity}: com.facebook.jni.CppException: false CHECK FAILED at ../c10/core/Backend.h (tensorTypeIdToBackend at ../c10/core/Backend.h:106)
(no backtrace available)
This is also the case when I try importing the Super Resolution model from here: https://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html
Also, another question: is PyTorch mobile stable enough and suitable for production? Or should we just use ONNX and Caffe 2 until it is more mature?
To me the error being around the Backend sounds like you have your model on GPU but need CPU.
So Iâm not in a position to make official pronouncements, but so the bulk of PyTorch mobile is âjust libtorchâ, and that is reasonably stable. So there might be bugs and I would expect that mobile gets better over time (maybe the API can be made nicer, there certainly seems some room for faster), but I would expect that what runs today will continue run well. Also people here will try to help you when you run into something blocking you.
Caffe2 on the other hand doesnât have support anymore.
Best regards
Thomas
Thanks for the response. The model was trained with CPU only PyTorch so Iâm not sure if it can be the problem.
My main recommendation for PyTorch mobile is to add more descriptive error messages. Most errors just show up in the logcat as a cpp file (usually backend, import, or function_schema_inl.h) and line number, which is hard to debug. Right now the platform is very new, and there isnât much information out there on how to fix errors so we have to figure it out based on the error messages.
Overall though, I think the API is actually quite nice, and itâs really great that mobile support has been added.
We tried both âHello Worldâ and âDemoâ apps. We were able to port on device. ResNet18 model with FC layer as final layer is working fine. But, when we try to replace the FC layer with Conv1x1 layer with same weights and bias, the App is not giving same output as the original one. We used Conv1x1 layer instead of FC layer in our models, because some frameworks like Intelâs OpenVINO, etc are not supporting FC layers.
In PyTorch, both models are giving same output. In App, they are giving different outputs. Is there any issue in âtorch.jit.traceâ API, when Conv1x1 is used in model?
Conv1x1 layer weights and bias are initialized as follows
resnet_model.conv1x1.weight.data = (resnet_model.model.fc.weight.data).view(1000, 512, 1, 1)
resnet_model.conv1x1.bias.data = resnet_model.model.fc.bias.data
Your feedback is appreciated.
Thanks and Regards
Prasad