Threading of Model Pytorch Android

mohit7 · November 29, 2019, 6:11am

I am trying to deploy my model on android device.Model should be fast in android but it is taking time as model is big.
Is model threading option is there while declaring the model interpreter in android as it is available in tensorflow-lite ?

IvanKobzarev · December 2, 2019, 9:44pm

Hello @mohit7
At the moment we do not have this setting, but we are thinking to add this option to control it.
Current threading model is determined by device, the number of threads is about number of BIG cores on the device.
More details about number of threads per device you can find in the code of function caffe2::ThreadPool::defaultThreadPool()

github.com

pytorch/pytorch/blob/master/caffe2/utils/threadpool/ThreadPool.cc#L24




// Whether or not threadpool caps apply to iOS
C10_DEFINE_int(caffe2_threadpool_ios_cap, true, "");


namespace caffe2 {


// Default smallest amount of work that will be partitioned between
// multiple threads; the runtime value is configurable
constexpr size_t kDefaultMinWorkSize = 1;


std::unique_ptr<ThreadPool> ThreadPool::defaultThreadPool() {
  CAFFE_ENFORCE(cpuinfo_initialize(), "cpuinfo initialization failed");
  int numThreads = cpuinfo_get_processors_count();


  bool applyCap = false;
#if C10_ANDROID
  applyCap = FLAGS_caffe2_threadpool_android_cap;
#elif C10_IOS
  applyCap = FLAGS_caffe2_threadpool_ios_cap;
#endif

Please write us if you have threadding issues with some particular device.

IvanKobzarev · December 11, 2019, 11:22pm

Hello @mohit7

We just exposed control on global number of threads used by pytorch android, it was landed in master

method org.pytorch.Module#setNumThreads(int numThreads)

(https://github.com/pytorch/pytorch/blob/master/android/pytorch_android/src/main/java/org/pytorch/Module.java#L57)
The latest android nightlies already include them: https://github.com/pytorch/pytorch/tree/master/android#nightly (you might need gradle argument --refresh-dependencies if you already using them)

Module module = Module.load(moduleFileAbsoluteFilePath);
module.setNumThreads(1);

This is new functionality, please report if you find any issues with it.

mohit7 · December 12, 2019, 8:55am

Thanks @IvanKobzarev it worked.
But you should explicitly put a restriction on no of threads user is setting because after a certain limit instead of decreasing run-time it is increasing.

One more doubt-
Just like in pytorch can we pass as batch of images in case of pytorch android. ?

Thanks,
Mohit Ranawat

IvanKobzarev · December 12, 2019, 6:40pm

Yes, performance degrades when that thread number is more than cpu cores number as more thread switches and thread contention.
But additional capping may introduce some non-transparency for this API, we will think about it.
Our plan is to revise our default thread pool number to be optimal for inference time by default for as much as possible devices.

Vision models is the same, the input shape is N_images * N_channels (e.g. 3) * IMAGE_HEIGHT * IMAGE_WIDTH.
So if you prepare Tensor with N_images > 1 that should work as in desktop version.

But org.pytorch.torchvisio.TensorImageUtils has api how to prepare tensors only with N_images==1, so it needs some additional code to prepare it with N_images > 1.

Do you think that will be useful to have in TensorImageUtils api some helper methods to prepare Tensors for image batches?

mohit7 · December 13, 2019, 4:41am

Hey @IvanKobzarev it will be good for image batches as it might help for application to applicaion.
Suppose our case model prediction is critical so we thought of passing multiple images to model to reduce misprediction.

mohit7 · December 16, 2019, 11:45am

Hey @IvanKobzarev suddenly setNumThreads is not working.
Is it removed from the nightly version ?
Thanks,
Mohit Ranawat

IvanKobzarev · December 16, 2019, 8:20pm

Hello @mohit7,

Sorry for inconvenience.
We moved setNumThreads method to separate class org.pytorch.PyTorchAndroid as a static method.

github.com

pytorch/pytorch/blob/master/android/pytorch_android/src/main/java/org/pytorch/PyTorchAndroid.java#L33


  public static Module loadModuleFromAsset(final AssetManager assetManager, final String assetName) {
    return new Module(new NativePeer(assetName, assetManager));
  }


  /**
   * Globally sets the number of threads used on native side. Attention: Has global effect, all
   * modules use one thread pool with specified number of threads.
   *
   * @param numThreads number of threads, must be positive number.
   */
  public static void setNumThreads(int numThreads) {
    if (numThreads < 1) {
      throw new IllegalArgumentException("Number of threads cannot be less than 1");
    }


    nativeSetNumThreads(numThreads);
  }


  private static native void nativeSetNumThreads(int numThreads);
}

mohit7 · December 17, 2019, 4:12am

Hey @IvanKobzarev now we are in problem.
The problem we were facing initially with the model loading with some specific layer is resolved in nightly version not in stable version. So If I am using the nightly version I can’t use threading.
Can you also move it to nightly version or give me suggestion for this problem?

IvanKobzarev · December 19, 2019, 8:17pm

Hello @mohit7,
PyTorchAndroid with setNumThreads was published only in nightly builds(they are published from master branch), it will be in stable only with 1.4 release (January)

I rechecked our sonatype nightlies:
The latest published artifact is https://oss.sonatype.org/service/local/repositories/snapshots/content/org/pytorch/pytorch_android/1.4.0-SNAPSHOT/pytorch_android-1.4.0-20191219.100544-76.aar

It contains PyTorchAndroid.class in classes.jar with setNumThreads

Just to check your gradle setup to use nightlies:

repositories {
    maven {
        url "https://oss.sonatype.org/content/repositories/snapshots"
    }
}

dependencies {
    ...
    implementation 'org.pytorch:pytorch_android:1.4.0-SNAPSHOT'
    implementation 'org.pytorch:pytorch_android_torchvision:1.4.0-SNAPSHOT'
    ...
}

You might need key --refresh-dependencies to force update of the dependencies if your gradle cache config is very long.

Let me know if you have any problems with it.

mohit7 · February 25, 2020, 12:17pm

Hey @IvanKobzarev are you supporting setNumThread in the stable version or is it removed?

ChamSu · November 10, 2020, 8:18am

Thanks, it’s working.

windmaple · January 12, 2021, 6:48am

What’s the equivalent API of setting thread# in iOS? I manually changed this line to forcefully set numThreads_ to 1 (and recompile the libs) but it has no effect on benchmark performance.

github.com

pytorch/pytorch/blob/8c5b0247a571aa672f9070fcc9769f2c4bb19571/caffe2/utils/threadpool/ThreadPool.cc#L112


int ThreadPool::getNumThreads() const {
  return numThreads_;
}
// Sets the number of threads
// # of threads should not be bigger than the number of big cores
void ThreadPool::setNumThreads(size_t numThreads) {
  if (defaultNumThreads_ == 0) {
    defaultNumThreads_ = getDefaultNumThreads();
  }
  numThreads_ = std::min(numThreads, defaultNumThreads_);
}
// Sets the minimum work size (range) for which to invoke the
// threadpool; work sizes smaller than this will just be run on the
// main (calling) thread
void ThreadPool::setMinWorkSize(size_t size) {
  std::lock_guard<std::mutex> guard(executionMutex_);
  minWorkSize_ = size;
}

erksch · April 10, 2023, 12:27am

Hey there!

We investigated concurrency with PyTorch Lite and were unable to leverage it with Kotlin coroutines.

I wrote this Android instrumented test to reproduce it:

@Test
fun testConcurrency() = runBlocking {
  val assetsManager = InstrumentationRegistry.getInstrumentation().context.assets
  val module1 = LiteModuleLoader.loadModuleFromAsset(assetsManager, "dummy_module_1.ptl")
  val module2 = LiteModuleLoader.loadModuleFromAsset(assetsManager, "dummy_module_2.ptl")
  
  val input = IValue.from(
    Tensor.fromBlob(
      floatArrayOf(1f, 1f, 1f, 1f, 1f, 1f, 1f, 1f, 1f, 1f), 
      longArrayOf(10)
    )
  )

  val singleInferenceDuration = measureTimeMillis {
    module1.forward(input)
  }

  println("Duration single inference: $singleInferenceDuration ms")

  val concurrentInferenceDuration = measureTimeMillis {
     listOf(
       launch { module1.forward(input) },
       launch { module2.forward(input) }
    ).joinAll()
  }

  println("Duration concurrent inference: $concurrentInferenceDuration ms")
}

The modules dummy_module_1 and dummy_module_2 are just linear layers.

The output is:

Duration single inference: 47 ms
Duration concurrent inference: 89 ms

Setting PyTorchAndroid.setNumThreads to different values did not change anything.

@IvanKobzarev any ideas here?