Module.forward() method run slowly on Pixel 3

I am trying to run PyTorch on Android system following this document. And the model is working fine, however i noticed that forward() method is consuming too much time.

On my Pixel 3 it need average 300ms to run (MobileNetV2 from torchvision). I also tried SSD-MobileNetV2-Lite model (pretrained) from this repo, and it cost about 700ms to run.

Same model on Tensorflow mobile seems much faster.

Can anyone tell me why and how to solve this performance issue?

Hello @royaff0,

  1. Could you please try using our nightlies ( https://github.com/pytorch/pytorch/tree/master/android#nightly ) what will be the performance. ( Some significant performance improvements were merged recently )

  2. You may try to use quantized pre-trained model (for example mobilenet_v2, specifying quantize=True)
    https://github.com/pytorch/vision/blob/master/torchvision/models/quantization/mobilenet.py#L59
    Are quantize accuracy and performance acceptable for your task?

Tensorflow lite has the support of mobile GPU, while PyTorch android at the moment only uses CPU, but we are working on this support at the moment.

The dependencies pytorch_android:1.9 linked in the solution 1 are not available on the nexus repository. One has to downgrad to pytorch_android:1.8 and the speed with that is still not satisfactory for my usage.

Is 1.9 available somewhere?
Should we expect a significant improvement with 1.9 for CPUs?

@Mr-G
The latest published nightlies are 1.8.0-SNAPSHOT

300ms and 700ms are very off what we are seeing in our benchmarks.

In the HelloWorld tutorial model preparation is naive and missing performance optimizations.

We have a separate tutorial for mobile optimizations: Pytorch Mobile Performance Recipes — PyTorch Tutorials 1.8.0 documentation

The main steps will be to use optimie_for_mobile for model preparation:

import torch
import torchvision
from torch.utils.mobile_optimizer import optimize_for_mobile

model = torchvision.models.mobilenet_v2(pretrained=True)
model.eval()

torchscript_model = torch.jit.script(model)
torchscript_model_optimized = optimize_for_mobile(torchscript_model)
torch.jit.save(torchscript_model_optimized, "model.pt")

and to use channels_last memory format input tensor:

Tensor inputNHWC = Tensor.fromBlob(dataNHWC, inputShape, MemoryFormat.CHANNELS_LAST);

ChannelsLast memory format is not integrated with TensorImageUtils yet.
Once [android][utils] Support ChannelsLast in TensorImageUtils by IvanKobzarev · Pull Request #48990 · pytorch/pytorch · GitHub PR will be landed - it will be available as argument for it:

    TensorImageUtils.imageYUV420CenterCropToFloatBuffer(
        image.getImage(),
        rotationDegrees,
        TENSOR_WIDTH,
        TENSOR_HEIGHT,
        TensorImageUtils.TORCHVISION_NORM_MEAN_RGB,
        TensorImageUtils.TORCHVISION_NORM_STD_RGB,
        mInputTensorBuffer,
        0,
        MemoryFormat.CHANNELS_LAST);

The optimization line slows down my model:

torchscript_model_optimized = optimize_for_mobile(torchscript_model)

The model contains a CNN and a bidirectional LSTM for 600 steps. On a pixel 4 it gets slowed down from 5s to 16s when I activate the optimization.