Feasibility of video activity recognition model on Android

I have a simple CNN-LSTM classifier model that takes in video input (<60seconds, 10fps, greyscale, currently around 200x160 resolution). Currently on a desktop, I load the entire video file as a Tensor into memory, and feed the sample into the model. I’m wondering how feasible this would be to run (inference) with PyTorch Mobile in an Android application, given memory, compute, and software limitations (and if not, any guidance on how I can change my approach to make the memory management work)?

Additionally, it looks like relatively trivial video preprocessing in Python doesn’t seem straightforward in
Android. I’d need to:

  1. Take the .mp4 file saved by user camera recording, and make it greyscale and lower the resolution.
  2. Convert the result into a Tensor

Just looking through the PyTorch documentation, I haven’t found a great way to do step 1. Potentially there are other libraries for video compression, and possibly using OpenCV in Android. For step 2 I’ve seen plenty of examples using Images (e.g., existing functions to convert Bitmaps to float), but I haven’t seen great examples of this for video.

Would love any guidance on where to look to build out this process, or likewise, confirmation that this isn’t currently feasible (or highly difficult).


@kwj2104 Please check the Video Classification example we added recently based on PytorchVideo library:

Github link for Android example: android-demo-app/TorchVideo at master · pytorch/android-demo-app · GitHub

Corresponding iOS version: ios-demo-app/TorchVideo at master · pytorch/ios-demo-app · GitHub