Building PyTorch 2.6.0 from source on macOS x86_64 (Intel)

uwu-420 · February 5, 2025, 5:07pm

Hi all

I know that pytorch for macOS x86_64 has been deprecated (see here [RFC] macOS x86 builds / test deprecation · Issue #114602 · pytorch/pytorch · GitHub), but I’d really love to try out flex attention on my Intel Mac

So I tried building pytorch from source.
I’m on macOS 14.7.1 with an Intel i5-8500B CPU and using Apple clang version 16.0.0 (clang-1600.0.26.6).

This is pretty much what I tried:

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git checkout tags/v2.6.0
git submodule sync
git submodule update --init --recursive

conda create -y -n pytorch
conda install cmake ninja
conda activate pytorch
pip install -r requirements.txt
pip install mkl-static mkl-include
export MACOS_DEPLOYMENT_TARGET=10.14 
export TORCH_BUILD_VERSION=2.6.0
export USE_MPS=0
export USE_CUDA=0
export USE_MKLDNN=1
export BUILD_TEST=1
export USE_DISTRIBUTED=0
export DEBUG=0
export MAX_JOBS=4
python setup.py bdist_wheel

and the build actually finishes.

But cd-ing into build and then running ctest I get failures for some test cases with AVX2 and AVX512. Something like this:

49: [ RUN      ] LGamma/1.LGamma
49: /Users/user/pytorch/aten/src/ATen/test/vec_test_all_types.h:880: Failure
49: Expected equality of these values:
49:   nearlyEqual<UVT>(expArr[i], actArr[i], absErr)
49:     Which is: false
49:   true
49: -331.86198661322277!=-331.85460524980596
49: Failure Details:
49: lgamma "/Users/user/pytorch/aten/src/ATen/test/vec_test_all_types.cpp":581
49: Test Seed to reproduce: 7361236382287739
49: Arguments:
49: #	 vec[-99.999999999999986, -69.770666540670334, -8.3928940026911025, -47.748549820369952]
49: Expected:
49: #	vec[-331.86198661322277, -227.90282083881428, -10.252139627749084, -138.21178391749243]
49: Actual:
49: #	vec[-331.85460524980596, -227.90282083881431, -10.252139627749084, -138.21178391749243]
49: First mismatch Index: 0
49:
49: [  FAILED  ] LGamma/1.LGamma, where TypeParam = at::vec::AVX2::Vectorized<double> (0 ms)
49: [----------] 1 test from LGamma/1 (0 ms total)

I get why the AVX512 test crashes, the CPU doesn’t have this instruction. But the AVX2 test failure due to imprecision worries me
Are my build flags wrong? Maybe I’m lucky and someone here is also interested in getting the latest pytorch to work on Intel macs

Also, does someone also have a quick tip on how to get the Python tests for pytorch to run? Just pip installing the wheel and pip install -r .ci/docker/requirements-ci.txt and then doing python test/run_test.py --include-slow resulted in some error for which I should contact the dev infra team (according to the error message), so I assume this is not how externals like me are supposed to run the tests?

I guess all of this boils down to the question of how do I properly build the latest pytorch version for x86_64 Intel Macs? And how can I ensure that my build is not broken?

P.S.: Are there already features for which compatibility with macOS x86_64 has been knowingly broken since the last version that worked for this platform target?

uwu-420 · February 28, 2025, 6:58pm

Oh well fuck I guess I’ll just share what I tried and found out so far. Get ready for some dumb stuff. This is the community support @seemethere was talking about lmao! This is probably full of silly mistakes and I’d love to be corrected. Here we go!

First off, one of my goals is to build pytorch to run on macOS versions going back all the way to 10.14 so I’m using otool -l <some_binary> to check which macOS version a binary at least requires to verify I’m on the right track. In the otool output you would have to look for the LC_BUILD_VERSION command and then its minos field. For older binaries you have to look for the LC_VERSION_MIN_MACOSX command and the version field. Don’t we love Apple renaming stuff?

Checking all binaries in a directory by hand sucks, so here’s a script you can call like python3 otool.py <path> (thanks ChatGPT).

#!/usr/bin/env python3

import argparse
import subprocess
import sys
from pathlib import Path


def parse_mach_o_versions(file_path: Path):
    try:
        output = subprocess.check_output(["otool", "-l", str(file_path)], stderr=subprocess.DEVNULL)
        lines = output.decode("utf-8", errors="replace").splitlines()
    except (subprocess.CalledProcessError, FileNotFoundError):
        return None, None

    minos = None
    sdk = None

    inside_version_min_macosx = False
    inside_build_version = False

    for line in lines:
        line_stripped = line.strip()

        if line_stripped.startswith("cmd LC_VERSION_MIN_MACOSX"):
            inside_version_min_macosx = True
            inside_build_version = False
            continue
        elif line_stripped.startswith("cmd LC_BUILD_VERSION"):
            inside_build_version = True
            inside_version_min_macosx = False
            continue

        if inside_version_min_macosx:
            if line_stripped.startswith("version "):
                _, ver_value = line_stripped.split(maxsplit=1)
                minos = ver_value.strip()
            elif line_stripped.startswith("sdk "):
                _, sdk_value = line_stripped.split(maxsplit=1)
                sdk = sdk_value.strip()

            if minos and sdk:
                break

        if inside_build_version:
            if line_stripped.startswith("minos "):
                _, ver_value = line_stripped.split(maxsplit=1)
                minos = ver_value.strip()
            elif line_stripped.startswith("sdk "):
                _, sdk_value = line_stripped.split(maxsplit=1)
                sdk = sdk_value.strip()

            if minos and sdk:
                break

    return minos, sdk


def main():
    parser = argparse.ArgumentParser(
        description="Recursively find Mach-O binaries and show their minimum macOS version and SDK."
    )
    parser.add_argument("path", type=str, help="Path to a file or directory.")
    args = parser.parse_args()

    root_path = Path(args.path).resolve()

    if not root_path.exists():
        print(f"Error: Path '{root_path}' does not exist.", file=sys.stderr)
        sys.exit(1)

    if root_path.is_dir():
        for file_path in root_path.rglob("*"):
            if file_path.is_file():
                minos, sdk = parse_mach_o_versions(file_path)
                if minos and sdk:
                    print(f"{minos}\t{sdk}\t{file_path.relative_to(root_path)}")
    else:
        if root_path.is_file():
            minos, sdk = parse_mach_o_versions(root_path)
            if minos and sdk:
                print(f"{minos}\t{sdk}\t{root_path}")
        else:
            print(f"Error: '{root_path}' is not a file.", file=sys.stderr)
            sys.exit(1)


if __name__ == "__main__":
    main()

So let’s unzip torch-2.2.2-cp312-none-macosx_10_9_x86_64.whl and run the script on its contents.

$ uv run otool.py torch-2.2.2-cp312-none-macosx_10_9_x86_64/
10.13   12.3    torch/_C.cpython-312-darwin.so
10.13   12.3    functorch/_C.cpython-312-darwin.so
10.13.6 10.14   functorch/.dylibs/libiomp5.dylib
10.13   12.3    torch/bin/protoc-3.13.0.0
10.13   12.3    torch/bin/torch_shm_manager
10.13   12.3    torch/bin/protoc
10.13   12.3    torch/lib/libtorch_python.dylib
10.13   12.3    torch/lib/libtorch.dylib
10.13   12.3    torch/lib/libtorch_global_deps.dylib
10.13.6 10.14   torch/lib/libiomp5.dylib
10.13   12.3    torch/lib/libtorch_cpu.dylib
10.13   12.3    torch/lib/libc10.dylib
10.13   12.3    torch/lib/libshm.dylib

Sus So this macosx_10_9_x86_64 in the name of the wheel file was a lie? Dunno, did someone with macOS 10.9 (rofl) try to run this?

Anyway, let’s get going and try to build it ourselves. It’s kinda hard to get access to a real Intel Mac running macOS 10.14 so I’ll just switch to a Linux machine and use quickemu to spin up a macOS 10.14 VM. The quickemu documentation in its repo is actually really easy to understand and helpful. The VM runs painfully slow but that doesn’t stop me from installing the Xcode Command Line Tools and cloning the pytorch repo.

xcode-select --install

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git checkout tags/v2.6.0
git submodule sync
git submodule update --init --recursive

I’m using Miniconda to setup a python 3.13 environment

conda create -y -n pytorch-build python=3.13
conda activate pytorch-build
conda install cmake ninja
pip install -r requirements.txt
pip install mkl-static mkl-include

Oh shit, there is no mkl-static wheel for my platform? Indeed, checking all mkl-static versions on PyPI they all seemingly require macOS 10.15. No way, let me check the wheels one by one. Cue the Cat Looks Inside meme. Gotcha mkl-static 2023.1.0

$ uv run otool.py mkl_static-2023.1.0-py2.py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64/
10.14   n/a     mkl_static-2023.1.0.data/data/lib/libmkl_core.a
10.14   n/a     mkl_static-2023.1.0.data/data/lib/libmkl_intel_lp64.a
10.14   n/a     mkl_static-2023.1.0.data/data/lib/libmkl_intel_ilp64.a

Bruh. Intel what are you doing? But hey, I’m happy to have found a version that should work with macOS 10.14 (right?). So change the tag in the mkl_static-2023.1.0.dist-info/WHEEL file of the unpacked wheel and re-pack it with a python script using from wheel.cli.pack import pack. Same with mkl-include 2023.1.0, intel-openmp 2023.1.0 and tbb 2021.10.0 which are also dependencies.

FYI

$ uv run otool.py tbb-2021.10.0-py2.py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64/
10.11   10.13   tbb-2021.10.0.data/data/lib/libtbbmalloc.2.10.dylib
10.11   10.13   tbb-2021.10.0.data/data/lib/libtbbmalloc_proxy.2.dylib
10.11   10.13   tbb-2021.10.0.data/data/lib/libtbbmalloc.dylib
10.11   10.13   tbb-2021.10.0.data/data/lib/libtbb.12.dylib
10.11   10.13   tbb-2021.10.0.data/data/lib/libtbbmalloc.2.dylib
10.11   10.13   tbb-2021.10.0.data/data/lib/libtbbmalloc_proxy.2.10.dylib
10.11   10.13   tbb-2021.10.0.data/data/lib/libtbbmalloc_proxy.dylib
10.11   10.13   tbb-2021.10.0.data/data/lib/libtbb.dylib
10.11   10.13   tbb-2021.10.0.data/data/lib/libtbb.12.10.dylib
$ uv run otool.py intel_openmp-2023.1.0-py2.py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64/
10.13.6 10.14   intel_openmp-2023.1.0.data/data/lib/libiomp5.dylib
10.13.6 10.14   intel_openmp-2023.1.0.data/data/lib/libiomp5_db.dylib
10.13.6 10.14   intel_openmp-2023.1.0.data/data/lib/libiompstubs5.dylib

So after having installed these, let’s get to build pytorch! Let’s set our environment variables

export MAX_JOBS="6"
export BUILD_TEST="0" 
export BUILD_TYPE="Release"
export DEBUG="0"
export CFLAGS="-mmacosx-version-min=10.14"
export CXXFLAGS="-mmacosx-version-min=10.14"
export LDFLAGS="-mmacosx-version-min=10.14"
export MACOSX_DEPLOYMENT_TARGET="10.14"
export MACOS_DEPLOYMENT_TARGET="10.14"
export PYTORCH_BUILD_VERSION="2.6.0"
export PYTORCH_BUILD_NUMBER="1"
export USE_MPS="0"
export USE_CUDA="0"
export USE_MKL="1"
export USE_MKLDNN="1" 
export USE_DISTRIBUTED="0"
export USE_NATIVE_ARCH="0"
export WERROR="1"

Not sure if all of these make sense tbh. I thought that using MPS on Intel Macs doesn’t make sense but when inspecting the last official pytorch build for Intel Macs I get this

>>> print(torch.__config__.show())
PyTorch built with:
  - GCC 4.2
  - C++ Version: 201703
  - clang 13.1.6
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220801 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201811
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/Applications/Xcode_13.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_PYTORCH_METAL_EXPORT -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DUSE_COREML_DELEGATE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

which has -DUSE_MPS so should it actually be enabled? Anyway. Let’s start the build

python setup.py bdist_wheel --plat-name=macosx_10_14_x86_64

Ah come on! The Clang provided by Apple for macOS 10.14 does not support some AVX512 related compiler flag

[321/7319] Building C object confu-deps/XNNPACK/CMakeFiles/microkerne...od.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o
FAILED: confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o
/Library/Developer/CommandLineTools/usr/bin/cc -DFXDIV_USE_INLINE_ASSEMBLY=0 -DXNN_ENABLE_ARM_BF16=0 -DXNN_ENABLE_ARM_DOTPROD=0 -DXNN_ENABLE_ARM_FP16_SCALAR=0 -DXNN_ENABLE_ARM_FP16_VECTOR=0 -DXNN_ENABLE_ARM_I8MM=0 -DXNN_ENABLE_ARM_SME2=0 -DXNN_ENABLE_ARM_SME=0 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_AVX256SKX=1 -DXNN_ENABLE_AVX256VNNI=1 -DXNN_ENABLE_AVX256VNNIGFNI=1 -DXNN_ENABLE_AVX512AMX=1 -DXNN_ENABLE_AVX512F=1 -DXNN_ENABLE_AVX512FP16=1 -DXNN_ENABLE_AVX512SKX=1 -DXNN_ENABLE_AVX512VBMI=1 -DXNN_ENABLE_AVX512VNNI=1 -DXNN_ENABLE_AVX512VNNIGFNI=1 -DXNN_ENABLE_AVXVNNI=0 -DXNN_ENABLE_AVXVNNIINT8=0 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_HVX=1 -DXNN_ENABLE_KLEIDIAI=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -DXNN_ENABLE_VSX=1 -I/Users/user/Documents/pytorch/third_party/XNNPACK/include -I/Users/user/Documents/pytorch/third_party/XNNPACK/src -I/Users/user/Documents/pytorch/third_party/pthreadpool/include -I/Users/user/Documents/pytorch/third_party/FXdiv/include -isystem /Users/user/Documents/pytorch/third_party/protobuf/src -mmacosx-version-min=10.14 -O3 -DNDEBUG -std=c99 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk -mmacosx-version-min=10.14 -fPIC -O2  -mf16c -mfma -mavx512f -mavx512cd -mavx512bw -mavx512dq -mavx512vl -mavx512vnni -mgfni -mavx512fp16 -MD -MT confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o -MF confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o.d -o confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o -c /Users/user/Documents/pytorch/third_party/XNNPACK/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c
clang: error: unknown argument: '-mavx512fp16'

No problem, let’s install MacPorts and get clang-17. Now set some environment variables to use the new clang

export CC=/opt/local/bin/clang
export CXX=/opt/local/bin/clang++
export CXX_COMPILER="/opt/local/bin/clang++"

and try again! Don’t forget to clean your build directory.

Oh well, actually the file build/CMakeFiles/microkernels-all.rsp exists so no idea what’s the issue here.

[4657/7319] Linking C static library lib/libmicrokernels-all.a
FAILED: lib/libmicrokernels-all.a
: && /Users/user/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/cmake/data/bin/cmake -E rm -f lib/libmicrokernels-all.a && /opt/local/bin/ar qc lib/libmicrokernels-all.a  @CMakeFiles/microkernels-all.rsp && /opt/local/bin/ranlib lib/libmicrokernels-all.a && /Users/user/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/cmake/data/bin/cmake -E touch lib/libmicrokernels-all.a && :
ar: @CMakeFiles/microkernels-all.rsp: No such file or directory

Fuck it. Doing this in a macOS 10.14 VM feels like a dead end. Why not just use Apple Rosetta and compile pytorch for x86_64 macOS from my M-series MacBook. We can actually get a x86_64 conda environment like this

conda create --platform osx-64 --name pytorch-build-x86_64 python=3.13

and do the whole setup again. Don’t forget to run everything with Rosetta. What could possibly go wrong? Let’s go!

arch -x86_64 python setup.py bdist_wheel --plat-name=macosx_10_14_x86_64

Sigh… Okay so there’s actually already quite a bit of stuff that just isn’t supported by the macOS 10.14 SDK. You’ll get errors like this

[4925/7319] Building CXX object third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o
FAILED: third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -D__STDC_FORMAT_MACROS -I/Users/user/Documents/Repos/pytorch/cmake/../third_party/benchmark/include -I/Users/user/Documents/Repos/pytorch/third_party/onnx -I/Users/user/Documents/Repos/pytorch/build/third_party/onnx -isystem /Users/user/Documents/Repos/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/user/Documents/Repos/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/user/Documents/Repos/pytorch/third_party/protobuf/src -isystem /Users/user/Documents/Repos/pytorch/third_party/XNNPACK/include -isystem /Users/user/Documents/Repos/pytorch/third_party/ittapi/include -isystem /Users/user/Documents/Repos/pytorch/cmake/../third_party/eigen -mmacosx-version-min=10.14 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -Wnon-virtual-dtor -O3 -DNDEBUG -std=gnu++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.0.sdk -mmacosx-version-min=10.14 -fPIC -MD -MT third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o -MF third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o.d -o third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o -c /Users/user/Documents/Repos/pytorch/third_party/onnx/onnx/checker.cc
In file included from /Users/user/Documents/Repos/pytorch/third_party/onnx/onnx/checker.cc:15:
/Users/user/Documents/Repos/pytorch/third_party/onnx/onnx/common/file_utils.h:20:20: error: 'path' is unavailable: introduced in macOS 10.15
   20 |   std::filesystem::path proto_u8_path = std::filesystem::u8path(proto_path);
      |                    ^

and that

In file included from /Users/user/Documents/Repos/pytorch/aten/src/ATen/native/mkl/SpectralOps.cpp:207:
/Users/user/Documents/Repos/pytorch/third_party/pocketfft/pocketfft_hdronly.h:159:15: error: 'aligned_alloc' is only available on macOS 10.15 or newer [-Werror,-Wunguarded-availability-new]
  159 |   void *ptr = ::aligned_alloc(align,(size+align-1)&(~(align-1)));
      |               ^~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.0.sdk/usr/include/malloc/_malloc.h:65:35: note: 'aligned_alloc' has been marked as being introduced in macOS 10.15 here, but the deployment target is macOS 10.14.0
   65 | void * __sized_by_or_null(__size) aligned_alloc(size_t __alignment, size_t __size) __result_use_check __alloc_align(1) __alloc_size(2) _MALLOC_TYPED(malloc_type_aligned_alloc, 2) __OSX_AVAILABLE(10.15) __IOS_AVAILABLE(13.0) __TVOS_AVAILABLE(13.0) __WATCHOS_AVAILABLE(6.0);
      |                                   ^
/Users/user/Documents/Repos/pytorch/third_party/pocketfft/pocketfft_hdronly.h:159:15: note: enclose 'aligned_alloc' in a __builtin_available check to silence this warning
  159 |   void *ptr = ::aligned_alloc(align,(size+align-1)&(~(align-1)));
      |               ^~~~~~~~~~~~~~~
  160 |   if (!ptr) throw std::bad_alloc();
  161 |   return ptr;
      |

While one could patch third_party/onnx/onnx/common/file_utils.h to support macOS 10.14 (by throwing out unicode support for paths lul) I surely won’t touch that aligned_alloc in aten/src/ATen/native/mkl/SpectralOps. Or would that be feasible? I don’t think that diverging from upstream is a good idea anyway. So yeah, fuck. I don’t think we’ll ever see a latest version of pytorch for macOS 10.14 again. Let’s raise the deployment target to macOS 10.15. I’m sure that not all of these are needed but whatever.

export CFLAGS="-mmacosx-version-min=10.15"
export CXXFLAGS="-mmacosx-version-min=10.15"
export LDFLAGS="-mmacosx-version-min=10.15"
export MACOSX_DEPLOYMENT_TARGET="10.15"
export MACOS_DEPLOYMENT_TARGET="10.15"

Ready to be disappointed again, let’s run

arch -x86_64 python setup.py bdist_wheel --plat-name=macosx_10_15_x86_64

Wow, that actually worked Checking the compiled tests doesn’t look too bad. Or does it?

$ cd build
$ arch -x86_64 ctest

...

97% tests passed, 3 tests failed out of 112

Total Test time (real) = 102.66 sec

The following tests FAILED:
         48 - vec_test_all_types_AVX512 (ILLEGAL)
         49 - vec_test_all_types_AVX2 (ILLEGAL)
         80 - scalar_tensor_test (Failed)
Errors while running CTest
Output from these tests are in: /Users/user/Documents/Repos/pytorch/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

$ arch -x86_64 ctest --rerun-failed --output-on-failure
Test project /Users/user/Documents/Repos/pytorch/build
    Start 48: vec_test_all_types_AVX512
1/3 Test #48: vec_test_all_types_AVX512 ........***Exception: Illegal  0.03 sec

    Start 49: vec_test_all_types_AVX2
2/3 Test #49: vec_test_all_types_AVX2 ..........***Exception: Illegal  0.48 sec
Running main() from /Users/user/Documents/Repos/pytorch/third_party/googletest/googletest/src/gtest_main.cc

...

    Start 80: scalar_tensor_test
3/3 Test #80: scalar_tensor_test ...............***Failed    0.65 sec
Running main() from /Users/user/Documents/Repos/pytorch/third_party/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from TestScalarTensor
[ RUN      ] TestScalarTensor.TestScalarTensorCPU
[       OK ] TestScalarTensor.TestScalarTensorCPU (58 ms)
[ RUN      ] TestScalarTensor.TestScalarTensorCUDA
[       OK ] TestScalarTensor.TestScalarTensorCUDA (0 ms)
[ RUN      ] TestScalarTensor.TestScalarTensorMPS
/Users/user/Documents/Repos/pytorch/aten/src/ATen/test/scalar_tensor_test.cpp:229: Failure
Expected equality of these values:
  lhs.numel()
    Which is: 1
  0

[  FAILED  ] TestScalarTensor.TestScalarTensorMPS (178 ms)
[----------] 3 tests from TestScalarTensor (236 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (237 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestScalarTensor.TestScalarTensorMPS

 1 FAILED TEST


0% tests passed, 3 tests failed out of 3

Total Test time (real) =   1.19 sec

The following tests FAILED:
         48 - vec_test_all_types_AVX512 (ILLEGAL)
         49 - vec_test_all_types_AVX2 (ILLEGAL)
         80 - scalar_tensor_test (Failed)
Errors while running CTest

Eh, I don’t mind test 48 and 49 but 80 worries me a bit. What do y’all think?

When installing the wheel from dist and importing torch I check the config output

>>> print(torch.__config__.show())
PyTorch built with:
  - GCC 4.2
  - C++ Version: 201703
  - clang 16.0.0
  - Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - Build settings: BLAS_INFO=accelerate, BUILD_TYPE=Release, COMMIT_SHA=1eba9b3aa3c43f86f4a2c807ac8e12c4a7767340, CXX_COMPILER=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=old-style-cast -Wconstant-conversion -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces, LAPACK_INFO=accelerate, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.6.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=OFF, USE_MKL=OFF, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=OFF, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

Not exactly what I expected, why is USE_MKL=OFF and USE_OPENMP=OFF? What do you think, anything else looking wrong?
But hey, it’s a start. Let’s run the python tests

pip install -r .ci/docker/requirements-ci.txt
python test/run_test.py --verbose --keep-going

This one is taking so much time I dunno. I’m seeing failures like stuff with MPS but not sure how much of this is really informative right now. Is there a better way to test whether I compiled pytorch properly? All in all this feels very messy and I’m not too confident I did everything right here. Inspecting the binary files in the resulting wheel also gives strange results

$ uv run python otool.py <path_to_unpacked_wheel>       
10.15   15.0    torch/_C.cpython-313-darwin.so
10.15   15.0    functorch/_C.cpython-313-darwin.so
15.0    15.0    torch/test/c10_ConstexprCrc_test
15.0    15.0    torch/test/dispatch_key_set_test
15.0    15.0    torch/test/type_test
15.0    15.0    torch/test/cpu_allocator_test
15.0    15.0    torch/test/weakref_test
15.0    15.0    torch/test/c10_string_view_test
15.0    15.0    torch/test/c10_exception_test
15.0    15.0    torch/test/packedtensoraccessor_test
15.0    15.0    torch/test/quantized_test
15.0    15.0    torch/test/c10_small_vector_test
15.0    15.0    torch/test/type_ptr_test
15.0    15.0    torch/test/c10_error_test
15.0    15.0    torch/test/c10_SizesAndStrides_test
15.0    15.0    torch/test/scalar_test
15.0    15.0    torch/test/c10_ordered_preserving_dict_test
15.0    15.0    torch/test/math_kernel_test
15.0    15.0    torch/test/kernel_stackbased_test
15.0    15.0    torch/test/c10_Metaprogramming_test
15.0    15.0    torch/test/MaybeOwned_test
15.0    15.0    torch/test/c10_ArrayRef_test
15.0    15.0    torch/test/operator_name_test
15.0    15.0    torch/test/c10_Synchronized_test
15.0    15.0    torch/test/inline_container_test
15.0    15.0    torch/test/c10_ssize_test
15.0    15.0    torch/test/c10_cow_test
15.0    15.0    torch/test/Dimname_test
15.0    15.0    torch/test/cpu_rng_test
15.0    15.0    torch/test/kernel_lambda_legacy_test
15.0    15.0    torch/test/mps_test_objc_interface
15.0    15.0    torch/test/mps_test_allocator
15.0    15.0    torch/test/dlconvertor_test
15.0    15.0    torch/test/cpu_profiling_allocator_test
15.0    15.0    torch/test/c10_intrusive_ptr_test
15.0    15.0    torch/test/pow_test
15.0    15.0    torch/test/c10_DispatchKeySet_test
15.0    15.0    torch/test/c10_NetworkFlow_test
15.0    15.0    torch/test/backend_fallback_test
15.0    15.0    torch/test/c10_InlineStreamGuard_test
15.0    15.0    torch/test/c10_optional_test
15.0    15.0    torch/test/c10_TypeIndex_test
15.0    15.0    torch/test/undefined_tensor_test
15.0    15.0    torch/test/basic
15.0    15.0    torch/test/List_test
15.0    15.0    torch/test/c10_SymInt_test
15.0    15.0    torch/test/c10_intrusive_ptr_benchmark
15.0    15.0    torch/test/extension_backend_test
15.0    15.0    torch/test/c10_Bitset_test
15.0    15.0    torch/test/thread_init_test
15.0    15.0    torch/test/kernel_function_legacy_test
15.0    15.0    torch/test/apply_utils_test
15.0    15.0    torch/test/make_boxed_from_unboxed_functor_test
15.0    15.0    torch/test/c10_Scalar_test
15.0    15.0    torch/test/legacy_vmap_test
15.0    15.0    torch/test/c10_DeviceGuard_test
15.0    15.0    torch/test/CppSignature_test
15.0    15.0    torch/test/reportMemoryUsage_test
15.0    15.0    torch/test/lazy_tensor_test
15.0    15.0    torch/test/mps_test_metal_library
15.0    15.0    torch/test/c10_string_util_test
15.0    15.0    torch/test/reduce_ops_test
15.0    15.0    torch/test/stride_properties_test
15.0    15.0    torch/test/c10_StreamGuard_test
15.0    15.0    torch/test/IListRef_test
15.0    15.0    torch/test/NamedTensor_test
15.0    15.0    torch/test/verify_api_visibility
15.0    15.0    torch/test/test_parallel
15.0    15.0    torch/test/operators_test
15.0    15.0    torch/test/op_allowlist_test
15.0    15.0    torch/test/c10_bit_cast_test
15.0    15.0    torch/test/mps_test_print
15.0    15.0    torch/test/scalar_tensor_test
15.0    15.0    torch/test/c10_Half_test
15.0    15.0    torch/test/c10_registry_test
15.0    15.0    torch/test/xla_tensor_test
15.0    15.0    torch/test/half_test
15.0    15.0    torch/test/c10_complex_math_test
15.0    15.0    torch/test/c10_DeadlockDetection_test
15.0    15.0    torch/test/c10_accumulate_test
15.0    15.0    torch/test/c10_ThreadLocal_test
15.0    15.0    torch/test/native_test
15.0    15.0    torch/test/c10_TypeList_test
15.0    15.0    torch/test/c10_bfloat16_test
15.0    15.0    torch/test/c10_InlineDeviceGuard_test
15.0    15.0    torch/test/wrapdim_test
15.0    15.0    torch/test/op_registration_test
15.0    15.0    torch/test/c10_lazy_test
15.0    15.0    torch/test/atest
15.0    15.0    torch/test/c10_generic_math_test
15.0    15.0    torch/test/kernel_function_test
15.0    15.0    torch/test/kernel_lambda_test
15.0    15.0    torch/test/memory_overlapping_test
15.0    15.0    torch/test/Dict_test
15.0    15.0    torch/test/c10_irange_test
15.0    15.0    torch/test/mobile_memory_cleanup
15.0    15.0    torch/test/c10_tempfile_test
15.0    15.0    torch/test/c10_CompileTimeFunctionPointer_test
15.0    15.0    torch/test/StorageUtils_test
15.0    15.0    torch/test/c10_Device_test
15.0    15.0    torch/test/broadcast_test
15.0    15.0    torch/test/c10_LeftRight_test
15.0    15.0    torch/test/ivalue_test
15.0    15.0    torch/test/c10_flags_test
15.0    15.0    torch/test/c10_TypeTraits_test
15.0    15.0    torch/test/KernelFunction_test
15.0    15.0    torch/test/memory_format_test
15.0    15.0    torch/test/c10_logging_test
15.0    15.0    torch/test/tensor_iterator_test
15.0    15.0    torch/test/cpu_generator_test
15.0    15.0    torch/test/c10_typeid_test
15.0    15.0    torch/test/c10_complex_test
15.0    15.0    torch/bin/test_tensorexpr
15.0    15.0    torch/bin/test_lazy
10.15   15.0    torch/bin/protoc-3.13.0.0
10.15   15.0    torch/bin/torch_shm_manager
15.0    15.0    torch/bin/tutorial_tensorexpr
15.0    15.0    torch/bin/test_edge_op_registration
15.0    15.0    torch/bin/test_jit
15.0    15.0    torch/bin/test_api
10.15   15.0    torch/bin/protoc
10.15   15.0    torch/lib/libtorch_python.dylib
15.0    15.0    torch/lib/libbackend_with_compiler.dylib
10.15   15.0    torch/lib/libtorch.dylib
10.15   15.0    torch/lib/libtorch_global_deps.dylib
10.15   15.0    torch/lib/libtorch_cpu.dylib
15.0    15.0    torch/lib/libjitbackend_test.dylib
10.15   15.0    torch/lib/libc10.dylib
15.0    15.0    torch/lib/libtorchbind_test.dylib
10.15   15.0    torch/lib/libshm.dylib
15.0    15.0    torch/lib/libaoti_custom_ops.dylib

Lots of test stuff included which I thought I excluded with export BUILD_TEST="0" and also plenty of binaries that require at least macOS 15 which is the version I’m currently on. Bummer. How can I avoid including all the test binaries? Why is it so hard to have all binaries require the minimum macOS version I specified?

So I’m very interested in hearing your suggestions, advice and opinions. Feel free to ask if you have questions about what I did here. Looking forward to get this community effort started

uwu-420 · March 2, 2025, 6:50pm

For visibility, the fine people at GitHub - conda-forge/pytorch-cpu-feedstock: A conda-smithy repository for pytorch-cpu. seem to have a very good idea about how to build pytorch, even for x86_64 macOS. Unfortunately this is in the conda ecosystem and I have no idea if one could build a normal wheel file with this. Btw, they also “only” support macOS 10.15+

Edit: It seems to actually support macOS 10.13+ see [RFC] macOS x86 builds / test deprecation · Issue #114602 · pytorch/pytorch · GitHub