According to the docs, MPS backend is using the GPU on M1, M2 chips via metal compute shaders.
mps
device enables high-performance training on GPU for MacOS devices with Metal programming framework. It introduces a new device to map Machine Learning computational graphs and primitives on highly efficient Metal Performance Shaders Graph framework and tuned kernels provided by Metal Performance Shaders framework respectively.The new MPS backend extends the PyTorch ecosystem and provides existing scripts capabilities to setup and run operations on GPU.
According to the following repository, ml-macos-performanc, inference on the ANE is 7x faster
densenet121_keras_applications Latency ANE : 0.0012743692083333827 RPS ANE : 784.7019478034924
densenet121_keras_applications Latency GPU : 0.008270947500000033 RPS GPU : 120.90513209036763
densenet121_keras_applications Latency CPU : 0.015347813229166719 RPS CPU : 65.15586195039286
I was wondering would the performance of training be better if we can also use the Apple’s Neural Engine?
There is obviously some restrictions, see unsupported neural engine layers, but that should be similar to Google’s TPUs.
It seems that there is maybe too much protection/security around directly calling the ANE,
according to George Hotz’s tinygrad, maybe that is the biggest blocker?