PyTorch vision on Android with AMD GPU

What is the most mature method of running inference for PyTorch models on a mobile AMD GPU? Specifically, I’m using an AMD Navi 2 having 4 WGP. Below are the few options I’m aware of, however I’d love to hear from those with direct experience who may suggest other options:

  • Compiling PyTorch 2.0 for an AMD target: I gather this is possible, but early in development and intended more for desktop than mobile. Is that right?
  • Using PyTorch Mobile and TorchVision on Android, an established path but one currently without GPU support, let alone AMD GPU support. Is that correct?
  • Run inference on a PyTorch model on the Android GPU via PyTorch on Vulkan Compute backend, or Android NNAPI, or using libMACE via ONNX. Is it true that Vulkan Compute may arbitrarily send compute to CPU that we want ion the GPU instead?

A few other notes follow in the hope they are helpful for background to my question.

PyTorch has mainly supported CUDA. While PyTorch AMD exists, converting from CUDA to Radeon Open Compute (ROCm) is notoriously difficult to use for real world deep learning. Similarly there are options for converting CUDA code to ROCm, which may not have been used yet in production.

PyTorch for AMD supports TorchVision and is ‘fully supported since ROCm 5.1 and upstreamed with PyTorch 1.12.’ but the project seems geared to servers, not mobile.

AMD support in PyTorch 2.0] may be more straightforward. PyTorch 2.0 includes torch.compile, via the Triton compiler, a language written in python that ‘provides much higher productivity than CUDA, but with the ability to beat the performance of highly optimized libraries like cuDNN with clean and simple code’:

“Triton can automatically optimize kernels generated by machine learning compilers such as TorchInductor for multiple AI accelerators including AMD Instinct GPU accelerator by leveraging hardware-specific features of the AMD CDNA™ GPU architecture.”

Before choosing to compile for AMD I ask:

  • Does the AMD Navi 2 support code compiled by Triton?
  • Can we compile for AMD Navi 2 from a server with an nVIDIA GPU?

For PyTorch Mobile, we need to adapt trained models by quantization and conversion to TorchScript, an intermediate representation of a PyTorch model. PyTorch optimization for mobile may help us run on AMD Navi 2, however GPU support is either nonexistent or very early.

Questions:

  • Can PyTorch Mobile use the AMD Navi 2 GPU? Early prototype work has not been used widely.