Pytorch support for M1 Mac GPU


Sometime back in Sept 2021, a post said that PyTorch support for M1 Mac GPUs is being worked on and should be out soon. Do we have any further updates on this, please?



For the moment, TF works pretty well:

even pure numpy is really fast with the right compiler flags

Hope to see PyTorch soon, I am loving the new DataPipes and functorch.


I’d like to draw attention to this question since I would also like it answered! :slight_smile:


I am curious as well.


Same! Would love to see support for this :slight_smile:

1 Like

Same here! GPU Support on M1 with PyTorch would be a game changer for me

its supports!


I am getting mixed results: On my Mac Studio Ultra I have installed the latest conda and the nightly torch build. Out of the box in CPU mode one little experiment was running about 8x faster than before (also CPU) so that is nice. However asking for “mps” the performance dropped to 8x worse than the old CPU perf… (64x worse than the new CPU perf). On other projects I got errors related to broadcast incompatibilities… All of these could be my problems but these do work with CUDA (on another machine). Anyway, just FYI. Looking forward to using this as it improves. Thanks to the team who made this possible!

Are there any new attributes for using M1/Metal native? I got the latest nightly and played with it a little bit, but couldn’t find any evidence that the SOC GPU was engaged. For example, torch.cuda.is_available() is still false, which I guess you’d expect, but if I create a tensor, tensor.device still shows CPU. All torch.device() are still set to cuda. Any pointers? Is it working on a metal GPU now without saying it’s working on a gpu or something?

Similar here: With MBP M1 Max 10 CPU core, 32 GPU core, 64GB RAM, the new PyTorch nightly build 1.13.0.dev20220620 is 3x faster with the CPU than my old version 1.10.0 using the same CPUs. Speed using GPU is terrible in comparison.

Also interesting, when looking at the 10 CPU cores’ usage, with 1.13 they are using ~15%. With the 1.10 version, they are all maxed out.

Clearly more is being optimized for the M1 chip in the new nightly build than just GPU support. The results are startling.

Tried again with the latest nightly builds. This time I used the pytorch example. Same general results: an RTX3090 runs it in 1 minute (rounding up). “cpu” mode on the Ultra M1 is 32 minutes and “mps” mode is 46 minutes. I am concerned / hopeful that I am just doing something wrong in setup because I have yet to see an example that runs faster in “mps” mode than “cpu” mode. If anyone has any suggestions I’d appreciate them. Thanks. (Again, all praise to the team working on this and I hope this feedback is constructive).