Threading of Model Pytorch Android

I am trying to deploy my model on android device.Model should be fast in android but it is taking time as model is big.
Is model threading option is there while declaring the model interpreter in android as it is available in tensorflow-lite ?

Hello @mohit7
At the moment we do not have this setting, but we are thinking to add this option to control it.
Current threading model is determined by device, the number of threads is about number of BIG cores on the device.
More details about number of threads per device you can find in the code of function caffe2::ThreadPool::defaultThreadPool()

Please write us if you have threadding issues with some particular device.

Hello @mohit7

We just exposed control on global number of threads used by pytorch android, it was landed in master

method org.pytorch.Module#setNumThreads(int numThreads)

(https://github.com/pytorch/pytorch/blob/master/android/pytorch_android/src/main/java/org/pytorch/Module.java#L57)
The latest android nightlies already include them: https://github.com/pytorch/pytorch/tree/master/android#nightly (you might need gradle argument --refresh-dependencies if you already using them)

Module module = Module.load(moduleFileAbsoluteFilePath);
module.setNumThreads(1);

This is new functionality, please report if you find any issues with it.

Thanks @IvanKobzarev it worked.
But you should explicitly put a restriction on no of threads user is setting because after a certain limit instead of decreasing run-time it is increasing.

One more doubt-
Just like in pytorch can we pass as batch of images in case of pytorch android. ?

Thanks,
Mohit Ranawat

Yes, performance degrades when that thread number is more than cpu cores number as more thread switches and thread contention.
But additional capping may introduce some non-transparency for this API, we will think about it.
Our plan is to revise our default thread pool number to be optimal for inference time by default for as much as possible devices.

Vision models is the same, the input shape is N_images * N_channels (e.g. 3) * IMAGE_HEIGHT * IMAGE_WIDTH.
So if you prepare Tensor with N_images > 1 that should work as in desktop version.

But org.pytorch.torchvisio.TensorImageUtils has api how to prepare tensors only with N_images==1, so it needs some additional code to prepare it with N_images > 1.

Do you think that will be useful to have in TensorImageUtils api some helper methods to prepare Tensors for image batches?

Hey @IvanKobzarev it will be good for image batches as it might help for application to applicaion.
Suppose our case model prediction is critical so we thought of passing multiple images to model to reduce misprediction.

Hey @IvanKobzarev suddenly setNumThreads is not working.
Is it removed from the nightly version ?
Thanks,
Mohit Ranawat

1 Like

Hello @mohit7,

Sorry for inconvenience.
We moved setNumThreads method to separate class org.pytorch.PyTorchAndroid as a static method.

Hey @IvanKobzarev now we are in problem.
The problem we were facing initially with the model loading with some specific layer is resolved in nightly version not in stable version. So If I am using the nightly version I can’t use threading.
Can you also move it to nightly version or give me suggestion for this problem?

Hello @mohit7,
PyTorchAndroid with setNumThreads was published only in nightly builds(they are published from master branch), it will be in stable only with 1.4 release (January)

I rechecked our sonatype nightlies:
The latest published artifact is https://oss.sonatype.org/service/local/repositories/snapshots/content/org/pytorch/pytorch_android/1.4.0-SNAPSHOT/pytorch_android-1.4.0-20191219.100544-76.aar

It contains PyTorchAndroid.class in classes.jar with setNumThreads

Just to check your gradle setup to use nightlies:

repositories {
    maven {
        url "https://oss.sonatype.org/content/repositories/snapshots"
    }
}

dependencies {
    ...
    implementation 'org.pytorch:pytorch_android:1.4.0-SNAPSHOT'
    implementation 'org.pytorch:pytorch_android_torchvision:1.4.0-SNAPSHOT'
    ...
}

You might need key --refresh-dependencies to force update of the dependencies if your gradle cache config is very long.

Let me know if you have any problems with it.

Hey @IvanKobzarev are you supporting setNumThread in the stable version or is it removed?

Thanks, it’s working.

What’s the equivalent API of setting thread# in iOS? I manually changed this line to forcefully set numThreads_ to 1 (and recompile the libs) but it has no effect on benchmark performance.

Hey there!

We investigated concurrency with PyTorch Lite and were unable to leverage it with Kotlin coroutines.

I wrote this Android instrumented test to reproduce it:

@Test
fun testConcurrency() = runBlocking {
  val assetsManager = InstrumentationRegistry.getInstrumentation().context.assets
  val module1 = LiteModuleLoader.loadModuleFromAsset(assetsManager, "dummy_module_1.ptl")
  val module2 = LiteModuleLoader.loadModuleFromAsset(assetsManager, "dummy_module_2.ptl")
  
  val input = IValue.from(
    Tensor.fromBlob(
      floatArrayOf(1f, 1f, 1f, 1f, 1f, 1f, 1f, 1f, 1f, 1f), 
      longArrayOf(10)
    )
  )

  val singleInferenceDuration = measureTimeMillis {
    module1.forward(input)
  }

  println("Duration single inference: $singleInferenceDuration ms")

  val concurrentInferenceDuration = measureTimeMillis {
     listOf(
       launch { module1.forward(input) },
       launch { module2.forward(input) }
    ).joinAll()
  }

  println("Duration concurrent inference: $concurrentInferenceDuration ms")
}

The modules dummy_module_1 and dummy_module_2 are just linear layers.

The output is:

Duration single inference: 47 ms
Duration concurrent inference: 89 ms

Setting PyTorchAndroid.setNumThreads to different values did not change anything.

@IvanKobzarev any ideas here?