I’m considering purchasing a new MacBook Pro and trying to decide whether or not it’s worth it to shell out for a better GPU.
Last I looked at PyTorch’s MPS support, the majority of operators had not yet been ported to MPS, and PYTORCH_ENABLE_MPS_FALLBACK
was required to train just about any model. The additional overhead of data transfer between MPS and CPU resulted in MPS training actually being slower than CPU training. I also saw some reports that there were issues with training accuracy on MPS.
Obviously, a lot of hard work has happened since then, and more operators are ported to MPS every week. So I’m wondering if anyone down in the trenches can give a “State of the Union” for MPS support. I’m particularly interested in the following questions:
- Is MPS still slower than CPU due to
PYTORCH_ENABLE_MPS_FALLBACK
being required? - Is there an approximate timeline for how long you think it will be until MPS is faster than CPU?
- Is there an approximate timeline for how long you think it will be until (almost) all operators have been ported to MPS?
- What’s the theoretical performance bump we can reasonably expect once 3 is complete?
My workflow is relatively simple. Using PyTorch Lightning and TorchGeo to train ResNet and ViT models from timm and segmentation-models-pytorch. Bonus points if anyone has actually done benchmarking for CPU vs. MPS performance.