Anyone tried out the PyTorch Mobile demo apps yet?

jspisak · October 16, 2019, 3:40am

Would love the feedback!

tom · October 16, 2019, 10:56am

So I gave it a shot for android but immediately entered a world of pain that probably is entirely unrelated to PyTorch but more with Android and my lack of Android experience.

I should admit that my experience with Android is very limited (I adapted the Caffe2 AI Camera demo app to use PyTorch with the ResNet, MarkRCNN, and Style Transfer about a year ago and ported PyTorch to Android for it, but that is about it).

Some of these comments might eventually lead to useful additions to the tutorial, but I don’t think I’m there yet.

The tutorial says:

We recommend you to open this project in Android Studio, in that case you will be able to install Android NDK and Android SDK using Android Studio UI.

This looks foolproof, but it seems not quite Thomas-poof:

It seems that you absolutely need to use the latest Android studio (mine was, I’m guessing, about 10 months old) - if you don’t have it, you get the most undescriptive error messages for an internet search reveals meant “upgrade to latest android studio” three versions ago, too,
Maybe one could say that one shoud Import the (gradle) project rather than opening it. Opening did nothing for me.

Then I tried the Hello World app:

Didn’t seem to work on the Android Emulator - I just got a white screen.
Works on actual hardware (I get a Wolf or dog classified as such).

Happily, the PyTorch demo app worked better - it worked on both.

On my phone (BQ Aquaris U Plus, so not a high-end phone), I get ~3.5x images / second for the quantized resnet. This left me wondering what I should expect. In particular, I have been wondering whether the PyTorch build I’m using is a debug or a release build - I realize that Android release builds come with signing and stuff, so it would be a debug build from the android studio side, but that doesn’t necessarily mean anything for the PyTorch library itself. I am guessing that it is a non-debug build of PyTorch – with the assumption that most people don’t need that even when they are debugging their app.

It is neat so see this work out and I admire that you furnished a text classification example, too. It would be supercool if you could add the model construction / serialization for the quantized resnet and text app, too (the HelloWorld app has the (trivially) simple tracing of resnet 18, but I didn’t see them for the PyTorch demo app).

All in all, it looks good! Thank you.

Best regards

Thomas

David_Reiss · October 16, 2019, 2:24pm

Thanks for the feedback! We’ll get the docs updated and figure out why HelloWorld isn’t working on the emulator.

The ResNet model is not quantized and expected to be fairly slow. We might just delete it from the demo app.

The quantized MobileNetV2 model is based on this tutorial: https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html . The text model was prepared by https://nbviewer.jupyter.org/gist/dreiss/ee4ff1ed2e137326d13e96bb4f953061 . The weights came from a trained PyText model. We’ll look into getting these links included in the app or docs.

tom · October 16, 2019, 2:29pm

Thank you, David!

Besteht regards

Thomas

lkhphuc · October 20, 2019, 4:18pm

I trained a custom resnet18 and mobilenetv2 with 3 classes output. However they all failed when I substituted the “resnet18.pt” from the example with my own models.

    findViewById(R.id.vision_card_resnet_click_area).setOnClickListener(v -> {
      final Intent intent = new Intent(VisionListActivity.this, ImageClassificationActivity.class);
      intent.putExtra(ImageClassificationActivity.INTENT_MODULE_ASSET_NAME, "resnet18-custom.pt");
      intent.putExtra(ImageClassificationActivity.INTENT_INFO_VIEW_TYPE,
          InfoViewFactory.INFO_VIEW_TYPE_IMAGE_CLASSIFICATION_RESNET);

Relevant error logs:

E/PyTorchDemo: Error during image analysis
    com.facebook.jni.CppException: false CHECK FAILED at aten/src/ATen/Functions.h (empty at aten/src/ATen/Functions.h:3535)
    (no backtrace available)
        at org.pytorch.Module$NativePeer.initHybrid(Native Method)
        at org.pytorch.Module$NativePeer.<init>(Module.java:70)
        at org.pytorch.Module.<init>(Module.java:25)
        at org.pytorch.Module.load(Module.java:21)
        at org.pytorch.demo.vision.ImageClassificationActivity.analyzeImage(ImageClassificationActivity.java:167)
        at org.pytorch.demo.vision.ImageClassificationActivity.analyzeImage(ImageClassificationActivity.java:31)

Is there anything else I need to do to change to use my custom model?

Thank you.

tom · October 20, 2019, 6:09pm

Yeah, well.

There is no C++ traceback, so it’s hard to tell what went wrong. My guess from the traceback is that it is in the loading of the model rather than executing.
Are you sure you put a TorchScript (most probably traced) model at the right place?

The HelloWorld app has a .py for exporting the model.

Best regards

Thomas

lkhphuc · October 20, 2019, 6:58pm

Thanks for your response. I did put in a traced script module though. And it’s indeed during the model loading time.

johnolafenwa · October 21, 2019, 3:06pm

Hi @lkhphuc @tom. A common cause for this is converting the model with a version of Pytorch less than 1.3.0. As of present, only models converted with the latest pytorch version would load.

See my latest tutorial on Pytorch Mobile including an easy to reuse open source image recognition example.
This provides a useful guide

https://heartbeat.fritz.ai/pytorch-mobile-image-classification-on-android-5c0cfb774c5b

lkhphuc · October 21, 2019, 3:25pm

My model was jit.trace and save yesterday on Google Colab. I checked and the pytorch version is 1.3.0+cu100. I think it was the latest already.

johnolafenwa · October 21, 2019, 4:31pm

If the Pytorch version is fine. This error also occurs when the model path is not found. Since the Module class accepts an absolute path to the model file. I suggest you confirm that the path to the model is a valid file.

David_Reiss · October 21, 2019, 4:47pm

Are you able to share this model? If so, we can debug and improve this error message.

lkhphuc · October 21, 2019, 5:43pm

@johnolafenwa Even if I changed my model name to ‘resnet18.pt’ to replace the current one in the Android repo, it’s still has problem.
@David_Reiss You can find the model here: https://drive.google.com/file/d/1fAhLpkoqNR32KQtzEkWe0LhBfAYb7Eiu/view?usp=sharing

Thank you.

ljk53 · October 21, 2019, 8:01pm

Thanks for sharing the model. We are following up the issue at: https://github.com/pytorch/pytorch/issues/28379

tom · October 30, 2019, 9:09am

So I have played with it some more:

I must say I dislike the hack around reading models (copying from assets to fs). I got burnt by forgetting to overwrite it more than once. My libtorch/JNI adaptation used torch::jit::load with string streams. This is less memory efficient, of course, but for development it is much more convenient. If it isn’t controversial, I could see if I file a PR.
One of the first things I needed was a method to get output pictures, I’ll file a PR for adding something like that to torchvision.
I hit a bug with some arm32 ops. Model works fine on Android/x86, colors/scanline widths are off on Android/arm(32). The net uses aten::_convolution, aten::add, aten::contiguous, aten::instance_norm, aten::reflection_pad2d, aten::relu_, aten::tanh, prim::Constant, prim::GetAttr, prim::ListConstruct with the most fancy convolutions being with stride=2 and a transposed one with stride=2 and output padding=1 (so no non-default dilations, kernel 1,3,7). I’m still trying to narrow this down. As error looks like it is some “striding”/contiguous problem, I tried inserting lots of contiguous but it didn’t immediately help. I know this isn’t narrowed down enough yet, unfortunately.

Best regards

Thomas

jspisak · November 4, 2019, 4:56am

cc @David_Reiss and @ljk53

David_Reiss · November 5, 2019, 4:38pm

I must say I dislike the hack around reading models (copying from assets to fs).

We expect most production apps to download models at runtime, so reading from the filesystem should be more natural in that case. However, you’re right that it feels hacky when you’re first developing with the model as an asset. We would accept a PR that added reading the model directly from the APK.

I hit a bug with some arm32 ops.

Please let us know what you find. Or, if you are comfortable sharing the model, we can debug as well.

Kareem_Belgharbi · November 5, 2019, 6:21pm

I tried out the hello-world app and could not get it to work on an emulator or actual device (Huawei P20 Lite). However, the demo app works perfectly.

I tried setting up a test with my own model (for person re-identification) using a similar setup as on the demo app and I get an error as soon as I load my model:

java.lang.RuntimeException: Unable to start activity ComponentInfo{com.example.pytorchmobile/com.example.pytorchmobile.MainActivity}: com.facebook.jni.CppException: false CHECK FAILED at ../c10/core/Backend.h (tensorTypeIdToBackend at ../c10/core/Backend.h:106)
    (no backtrace available)

This is also the case when I try importing the Super Resolution model from here: https://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html

Also, another question: is PyTorch mobile stable enough and suitable for production? Or should we just use ONNX and Caffe 2 until it is more mature?

tom · November 6, 2019, 6:46am

To me the error being around the Backend sounds like you have your model on GPU but need CPU.

So I’m not in a position to make official pronouncements, but so the bulk of PyTorch mobile is “just libtorch”, and that is reasonably stable. So there might be bugs and I would expect that mobile gets better over time (maybe the API can be made nicer, there certainly seems some room for faster), but I would expect that what runs today will continue run well. Also people here will try to help you when you run into something blocking you.
Caffe2 on the other hand doesn’t have support anymore.

Best regards

Thomas

Kareem_Belgharbi · November 6, 2019, 6:17pm

Thanks for the response. The model was trained with CPU only PyTorch so I’m not sure if it can be the problem.

My main recommendation for PyTorch mobile is to add more descriptive error messages. Most errors just show up in the logcat as a cpp file (usually backend, import, or function_schema_inl.h) and line number, which is hard to debug. Right now the platform is very new, and there isn’t much information out there on how to fix errors so we have to figure it out based on the error messages.

Overall though, I think the API is actually quite nice, and it’s really great that mobile support has been added.

vgsprasad · November 8, 2019, 4:39am

We tried both ‘Hello World’ and ‘Demo’ apps. We were able to port on device. ResNet18 model with FC layer as final layer is working fine. But, when we try to replace the FC layer with Conv1x1 layer with same weights and bias, the App is not giving same output as the original one. We used Conv1x1 layer instead of FC layer in our models, because some frameworks like Intel’s OpenVINO, etc are not supporting FC layers.

In PyTorch, both models are giving same output. In App, they are giving different outputs. Is there any issue in ‘torch.jit.trace’ API, when Conv1x1 is used in model?

Conv1x1 layer weights and bias are initialized as follows

resnet_model.conv1x1.weight.data = (resnet_model.model.fc.weight.data).view(1000, 512, 1, 1)
resnet_model.conv1x1.bias.data = resnet_model.model.fc.bias.data

Your feedback is appreciated.

Thanks and Regards
Prasad