Is the AMX accelerator used on Apple silicon?

From issue #47702 on the PyTorch repository, it is not yet clear whether PyTorch already uses AMX on Apple silicon to accelerate computations. It might do this because it relies on the operating system’s BLAS library, which is Accelerate on macOS. For reasons not described here, Apple has released little documentation on the AMX ever since its debut in the A13 chip.

If PyTorch does already use AMX, then that is ~1.3 TFLOPS of processing power. For comparison, the M1 GPU has 2.6 TFLOPS. The issue linked above was raised partially because PyTorch lacked hardware acceleration on Apple devices for a very long time. If AMX is in fact used and has comparable performance to GPU acceleration, then many people might want to know.

Could anyone investigate whether the AMX is being used? You may need to learn a bit of Swift, which provides direct access to Accelerate and microsecond-level precision for profiling. Note that M1 has one AMX, while M1 Pro/Max has two. Here are some helpful links for anyone who wishes to investigate this:


Some more helpful links:

This issue suggests that AMX is used as of the MPS nightlies:

I’ve run the same example on my vanilla M1 using a nightly and get 100% CPU use, consistent with just 1 AMX, vs 2 for the OP.

I did a cursory look through PRs and didn’t see Accelerate explicitly mentioned…

I forgot to hyperlink the following comment on this thread. The AMX needs to use all 8 power cores for full utilization.