Anyone tried out the PyTorch Mobile demo apps yet?

Are you able to share this model? If so, we can debug and improve this error message.

@johnolafenwa Even if I changed my model name to ‘resnet18.pt’ to replace the current one in the Android repo, it’s still has problem.
@David_Reiss You can find the model here: https://drive.google.com/file/d/1fAhLpkoqNR32KQtzEkWe0LhBfAYb7Eiu/view?usp=sharing

Thank you.

Thanks for sharing the model. We are following up the issue at: https://github.com/pytorch/pytorch/issues/28379

1 Like

So I have played with it some more:

  • I must say I dislike the hack around reading models (copying from assets to fs). I got burnt by forgetting to overwrite it more than once. My libtorch/JNI adaptation used torch::jit::load with string streams. This is less memory efficient, of course, but for development it is much more convenient. If it isn’t controversial, I could see if I file a PR.
  • One of the first things I needed was a method to get output pictures, I’ll file a PR for adding something like that to torchvision.
  • I hit a bug with some arm32 ops. Model works fine on Android/x86, colors/scanline widths are off on Android/arm(32). The net uses aten::_convolution, aten::add, aten::contiguous, aten::instance_norm, aten::reflection_pad2d, aten::relu_, aten::tanh, prim::Constant, prim::GetAttr, prim::ListConstruct with the most fancy convolutions being with stride=2 and a transposed one with stride=2 and output padding=1 (so no non-default dilations, kernel 1,3,7). I’m still trying to narrow this down. As error looks like it is some “striding”/contiguous problem, I tried inserting lots of contiguous but it didn’t immediately help. I know this isn’t narrowed down enough yet, unfortunately.

Best regards

Thomas

cc @David_Reiss and @ljk53

I must say I dislike the hack around reading models (copying from assets to fs).

We expect most production apps to download models at runtime, so reading from the filesystem should be more natural in that case. However, you’re right that it feels hacky when you’re first developing with the model as an asset. We would accept a PR that added reading the model directly from the APK.

I hit a bug with some arm32 ops.

Please let us know what you find. Or, if you are comfortable sharing the model, we can debug as well.

I tried out the hello-world app and could not get it to work on an emulator or actual device (Huawei P20 Lite). However, the demo app works perfectly.

I tried setting up a test with my own model (for person re-identification) using a similar setup as on the demo app and I get an error as soon as I load my model:

java.lang.RuntimeException: Unable to start activity ComponentInfo{com.example.pytorchmobile/com.example.pytorchmobile.MainActivity}: com.facebook.jni.CppException: false CHECK FAILED at ../c10/core/Backend.h (tensorTypeIdToBackend at ../c10/core/Backend.h:106)
    (no backtrace available)

This is also the case when I try importing the Super Resolution model from here: https://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html

Also, another question: is PyTorch mobile stable enough and suitable for production? Or should we just use ONNX and Caffe 2 until it is more mature?

To me the error being around the Backend sounds like you have your model on GPU but need CPU.

So I’m not in a position to make official pronouncements, but so the bulk of PyTorch mobile is “just libtorch”, and that is reasonably stable. So there might be bugs and I would expect that mobile gets better over time (maybe the API can be made nicer, there certainly seems some room for faster), but I would expect that what runs today will continue run well. Also people here will try to help you when you run into something blocking you.
Caffe2 on the other hand doesn’t have support anymore.

Best regards

Thomas

Thanks for the response. The model was trained with CPU only PyTorch so I’m not sure if it can be the problem.

My main recommendation for PyTorch mobile is to add more descriptive error messages. Most errors just show up in the logcat as a cpp file (usually backend, import, or function_schema_inl.h) and line number, which is hard to debug. Right now the platform is very new, and there isn’t much information out there on how to fix errors so we have to figure it out based on the error messages.

Overall though, I think the API is actually quite nice, and it’s really great that mobile support has been added.

We tried both ‘Hello World’ and ‘Demo’ apps. We were able to port on device. ResNet18 model with FC layer as final layer is working fine. But, when we try to replace the FC layer with Conv1x1 layer with same weights and bias, the App is not giving same output as the original one. We used Conv1x1 layer instead of FC layer in our models, because some frameworks like Intel’s OpenVINO, etc are not supporting FC layers.

In PyTorch, both models are giving same output. In App, they are giving different outputs. Is there any issue in ‘torch.jit.trace’ API, when Conv1x1 is used in model?

Conv1x1 layer weights and bias are initialized as follows

resnet_model.conv1x1.weight.data = (resnet_model.model.fc.weight.data).view(1000, 512, 1, 1)
resnet_model.conv1x1.bias.data = resnet_model.model.fc.bias.data

Your feedback is appreciated.

Thanks and Regards
Prasad

No kidding. I hit a thing loading parameters where it just said “something ending with weight cannot be found in the saved model file”… Glad that is fixed now to give the full hierarchy. :slight_smile: I think this is more C++ than Android-specific.
It’s still strange that you would have that error message from a model that doesn’t have custom tensor type ids (because as far as I can see that error message is for unknown type ids and should be the same for mobile vs. PC). Does the model load on the computer if you use PyTorch 1.3?

Best regards

Thomas

Blockquote Also, another question: is PyTorch mobile stable enough and suitable for production? Or should we just use ONNX and Caffe 2 until it is more mature?

The feature is still experimental so we do expect rough edges as this is brand new as of this quarter. We are, in parallel, working through production deployment inside FB and so the code base is improving on a daily basis. If you are looking for something that is moving slower and super stable in the short term, the ONNX export path is probably your best bet (i.e. ONNX+CoreML for iOS, etc…). Let us know if we can help on your prod deployment here.

1 Like

thanks @Kareem_Belgharbi for the feedback and comments. We can definitely improve error messaging for the next release.

for the API comments, can you elaborate on what you find nice?

Is there a way to shrink the size of the Pytorch lib on mobile deployment? We compiled the demo app without the model and the app size was ~79Mb. Pretty great so far. Much fewer headaches than TFLite to get models on the phone.

I like how it’s very simple and to the point, and has nice convenience functions for working with images. I also really like how easy it is to go from tensors to IValues and convert data to tensors, but what I like most is how easy the conversion process is since it uses .pt files. With TensorFlow Lite I would always get a conversion error that would take forever to debug but here it’s painless.

1 Like

I ended up fixing the error but unfortunately I don’t remember how. I did end up getting a whole bunch of other import errors mostly from import.cpp, but in the end it tuned out that I had compiled the model with a different version of PyTorch.

very helpful, thanks for sharing!

Edit: solved this. I’m not entirely sure since I changed a couple of things, but my best guess as to what caused this is either doing a variable.int() or having multiple inputs to the model ((x, y))

I’m getting this with trying to load an exported model in iOS (Using Pytorch 1.3.1):

private lazy var module: TorchModule = {
        if let filePath = Bundle.main.path(forResource: "model", ofType: "pt"),
            let module = TorchModule(fileAtPath: filePath) {
            return module
        } else {
            fatalError("Can't find the model file!")
        }
    }()

false CHECK FAILED at /Users/distiller/project/c10/core/Backend.h (tensorTypeIdToBackend at /Users/distiller/project/c10/core/Backend.h:106)

Model looks like:

import torch
from model import TDNN_LSTM

torch.set_grad_enabled(False)
model = TDNN_LSTM.load_model("model_path.pth")
model = model.to("cpu")
model = model.eval()

x = torch.rand((1, 80, 300)).to("cpu")
y = torch.IntTensor([300]).to("cpu")
traced = torch.jit.trace(model, (x, y))
traced = traced.eval().to("cpu")

traced.save("traced.pt")

Any idea what could be causing this? I confirmed that the model runs correctly on a CPU in the Python interpreter. I figured it was a problem with the model being on CUDA but I made (extra) sure to put everything on the CPU. Can upload a version of model somewhere if that would help. Thanks!

1 Like

Hi!

This is fixed in the latest snapshot build. Please make the changes scene here to your build file: https://github.com/ljk53/android-demo-app/commit/443b524b8dce3425548271b071f0545194f02fa5

I had the same problem :frowning:

My model (a variation of YOLO_pytorchv3) is providing worse results on android then on GPU. Is this expected behaviour? I have written custom code to parse the output tensor and compute NMS/post-processing of outputs.