Is there any reason ios metal libtorch using MPSImage to make tensor?

It seems MetalTensor is created using MPSImage, from here

due to its limitation it seems not possible to create tensor whose length is longer than 16384, or 8192, due to MTLTextureDescriptor (width, height limit)

this is problematic when processing speech data, usually longer than such limit. with error,

-[MTLTextureDescriptorInternal validateWithDevice:], line 1344: error 'Texture Descriptor Validation
MTLTextureDescriptor has width (160000) greater than the maximum allowed size of 16384.

here’s question, is there any reason to implement MetalTensor using MPSImage, not using MTLBuffer? let me know if i understand wrong

Hi, so the primary reason MetalTensor uses MPSImage is that we had image processing models in mind when creating the Metal backend, and in that context image tensors were more performant especially since we were using MPSCNN for key operators such as Convolution which consumed and produced MPSImages.

Of course, as you’ve mentioned MPSImage does not work well for speech models due to the tensor shapes having different properties, and the plan was to eventually offer the option to use MTLBuffer for these use cases but unfortunately we never got around to that due to the focus shifting to enabling the CoreML delegate which we found to be more performant than the Metal backend in most cases. I would recommend you give the CoreML delegate a try since you are having issues with Metal.

Thanks for answer, it seems convert model using CompileSpec seems really similar to converting pytorch model into .mlmodel using coremltools. what’s the difference between _jit_to_backend('coreml') and coremltools.convert?

I believe jit_to_backend('coreml') calls coremltools.convert under the hood. The primary difference is that coremltools.convert produces a CoreML model, but jit_to_backend('coreml') will wrap the CoreML model with a torchscript wrapper so that you can still use torchscript to execute the model.

1 Like