Inference on iOS and GPU

cmarschner · November 4, 2023, 3:28pm

Hi,
I’ve spent a couple of days to get MobileSAM (Segment Anything based on TinyVIT) working on iOS on GPU, but so far I’ve only managed to get a CPU version running, with a latency of 2 seconds.

What would be the options for speeding this up…?

pytorch mobile with metal backend: pytorch lite doesn’t support a couple of operators I had to work around, but in the end the lack of support for strided convolution for the metal backend became a blocker.
pytorch MPS backend: There is no information about whether this even targets mobile
Apple CoreMLTools: Convert via ONNX - only a subset of operators supported
ONNX runtime?
Convert to TFLite via ONNX? I got some weird metadata there that made this a blocker (operator names where operator type names should be)
pytorch ExecuTorch: In development, earliest end of 2024

It seems this has not been a priority for either Apple or Pytorch to make this work and support it end-to-end… Is there a way out?