Is the AMX accelerator used on Apple silicon?

From issue #47702 on the PyTorch repository, it is not yet clear whether PyTorch already uses AMX on Apple silicon to accelerate computations. It might do this because it relies on the operating system’s BLAS library, which is Accelerate on macOS. For reasons not described here, Apple has released little documentation on the AMX ever since its debut in the A13 chip.

If PyTorch does already use AMX, then that is ~1.3 TFLOPS of processing power. For comparison, the M1 GPU has 2.6 TFLOPS. The issue linked above was raised partially because PyTorch lacked hardware acceleration on Apple devices for a very long time. If AMX is in fact used and has comparable performance to GPU acceleration, then many people might want to know.

Could anyone investigate whether the AMX is being used? You may need to learn a bit of Swift, which provides direct access to Accelerate and microsecond-level precision for profiling. Note that M1 has one AMX, while M1 Pro/Max has two. Here are some helpful links for anyone who wishes to investigate this:

2 Likes

Some more helpful links: