This thread is for carrying on any discussion from:
It seems that Apple is choosing to leave Intel GPUs out of the PyTorch backend, when they could theoretically support them. For reference, on the other thread, I pointed out that Apple did the same thing with their TensorFlow backend. When it was released, I only owned an Intel Mac mini and could not run GPU-accelerated TF. Other people may feel the same way, even though M1 is more common now.
For their earliest (now archived) TF 2.4 backend with MLCompute, it crashed at runtime on the Mac mini from allocating 40 GB of virtual memory. The second backend officially removed support for Intel GPUs, which are still a large part of their consumer base.
Sorry for the inaccurate answer on the previous post.
After some more digging, you are absolutely right that this is supported in theory.
The reason why we disable it is because while doing experiments, we observed that these GPUs are not very powerful for most users and most are better off using the CPU part which will actually be faster.
And so while most users do have these processors, most of them should not use them for ML workloads.
I don’t plan on compiling PyTorch myself as that isn’t my primary ML project, but I will inject my opinion here. I think it’s a bad idea to prevent the user from accessing something. Most people won’t have the patience or experience to compile PyTorch from source and use the compiled build products ergonomically. As someone who makes software for the user, it should be up to the user to decide. Especially if someone happens to run a CPU-intensive process alongside their ML process, where the GPU would be the part of the chip that’s open to computation. This will also make your PyTorch backend stand out from the TF backend.
I think it would be best to enable support from the start, then disable it if there’s a strong signal from people to do so. I recommend that you put a warning in the PyTorch docs saying “this may be slow on Intel GPUs”. Or at the very least, put a large notice telling Intel Mac users how to compile PyTorch from source if they want to test an Intel Mac GPU.
Edit: It would also be weird if you have a script on macOS that tries to profile the GPU or use the GPU in some way, only to have the framework disable acceleration when you switch between your Apple and Intel Mac. Maybe you could provide a hidden or documented option to re-enable execution on the Intel device through the Python API. It should be extremely simple to add that feature to PyTorch - just a conditional statement surrounding your cited ØƀʄɛɕẗĮⱴə-Ƈ code. Although I’m not going to make a PR to do so myself.
I concur with @philipturner. This should be built into the library itself. PyTorch isn’t an end-user product. It should allow its developers to do what they want. Especially when this situation could easily be controlled by a simple boolean check. Recompiling the library seems like overkill for this purpose.
One big reason why I’m dead set on using Intel GPUs is my personal project, the revival of Swift for TensorFlow (S4TF). This is another ML framework like PyTorch, but different in that could theoretically run on iOS and could take drastically less time to compile. There’s going to be two possible compile options. One is the old version, which uses the TensorFlow code base as a backend and is CPU-only on macOS. The other option uses a small custom code base, is GPU-only, and runs on iOS and macOS, among other platforms. The code base can be small because system libraries (MPS and MPSGraph) contain the kernels and graph compiler. Or, in the case of OpenCL, the kernel library is DLPrimitives, which is tiny.
For something that’s GPU-only, it will be mandatory to use the Intel GPU on certain Macs. The maximum limit of ALU utilization for matrix multiplications is around 90% on Intel GPUs. This means ~350 GFLOPS of power for the Intel UHD 630. Compare that to the CPU, which is on the order of 10’s of GFLOPS. In theory, if all other bottlenecks are eliminated, most models would run faster on the Intel GPU than the CPU.
The big “if” is whether bottlenecks are eliminated. I hypothesize that CPU overhead or model configurations that underutilize the GPU are why it runs slow on PyTorch. For S4TF, I have quite extensive plans to reduce CPU overhead, leaving the only problem being models that underutilize the GPU. For example, oddly shaped matrix multiplications or convolutions that can’t use Winograd. Potentially, the entire Intel GPU architecture is terrible at ML, even the 10 TFLOPS Arc Alchemist. But that conclusion contradicts the fact that Intel invested money and time making MMX kernels for Intel GPUs in oneDNN.
We will have to wait and see why the Intel GPUs are being so slow for training, whether because of PyTorch’s design or some other fundamental problem that can’t be solved in an S4TF backend. Even if it is slower, I will definitely give the user the choice of using the CPU or GPU on Macs with only an Intel GPU.