Current state of MPS

I’m considering purchasing a new MacBook Pro and trying to decide whether or not it’s worth it to shell out for a better GPU.

Last I looked at PyTorch’s MPS support, the majority of operators had not yet been ported to MPS, and PYTORCH_ENABLE_MPS_FALLBACK was required to train just about any model. The additional overhead of data transfer between MPS and CPU resulted in MPS training actually being slower than CPU training. I also saw some reports that there were issues with training accuracy on MPS.

Obviously, a lot of hard work has happened since then, and more operators are ported to MPS every week. So I’m wondering if anyone down in the trenches can give a “State of the Union” for MPS support. I’m particularly interested in the following questions:

  1. Is MPS still slower than CPU due to PYTORCH_ENABLE_MPS_FALLBACK being required?
  2. Is there an approximate timeline for how long you think it will be until MPS is faster than CPU?
  3. Is there an approximate timeline for how long you think it will be until (almost) all operators have been ported to MPS?
  4. What’s the theoretical performance bump we can reasonably expect once 3 is complete?

My workflow is relatively simple. Using PyTorch Lightning and TorchGeo to train ResNet and ViT models from timm and segmentation-models-pytorch. Bonus points if anyone has actually done benchmarking for CPU vs. MPS performance.

2 Likes

Same here. I also want to know if it’s worthwhile to buy M2 Ultra for training with MPS utilized.