PyTorch support for Intel GPUs on Mac

philipturner · July 12, 2022, 9:23pm

@albanD I’m curious about how bad the Intel GPU was during internal benchmarks. Before getting into this, I have a few questions:

Did you test only the 400-GFLOPS UHD 630, or also the 800-GFLOPS Iris Plus? The second processor has 35% of the FLOPS of a 7-core M1, with relatively similar ALU utilization during matrix multiplications. It should also have identical main memory bandwidth.
Did you try using shared memory on Intel iGPUs, which would bring performance closer to Apple iGPUs?
Did you say the Intel iGPU was slower than single-core CPU or multi-core CPU?

Let’s say that someone can only use operators available to MPS. They can’t process double-precision numbers either. They run every single operation on the GPU. Based on your benchmarks, what is the performance delta of ____ compared to single-core CPU?

Apple integrated GPU
Intel integrated GPU

Intel Macs don’t have AMX, so CPU matrix multiplications are considerably slower. If you could provide both average and worst-case metrics, that would be just what I’m looking for.

I’m asking this because a GPU backend I’m developing for machine learning is GPU-only. Removing CPU operations makes my code base smaller and more maintainable. In an era where exponential growth in processing power comes from greater parallelization, single-core CPU is becoming increasingly obsolete. That is why I’m pursuing such intense latency optimizations described in Sequential throughput of GPU execution. I have to make ML operators run as fast as possible on an Intel iGPU, because I cannot run them on the CPU.

I would argue that this is problematic because PyTorch is an end-user product. Most clients don’t have the knowledge or experience with Git/command-line to compile PyTorch. They might not even know that Objective-C exists, and Python is their first programming language. Are we telling them that because of their lack of experience, they don’t have the right to test their iGPU for machine learning? Even if it is slower, they lack access to appropriate tools for proving that it is slower and reproducing that proof themself. These are concepts we take for granted in the field of science, where reproducibility is mandatory.

This is something Apple benefits from, because the only other options are either (1) upgrade to an M1 Mac or (2) switch to PC and get a cheaper Nvidia GPU with tensor cores. Now what if they are a teenager and can’t muster up ~~hundreds~~ over a thousand dollars to upgrade their hardware, because their parents aren’t giving them that stuff for free? I have been in this exact position before. I had a powerful Apple GPU, and made a whole research paper centered on it. But the M1-family GPU was on my iPhone, not my Mac.