Oh well fuck I guess I’ll just share what I tried and found out so far. Get ready for some dumb stuff. This is the community support @seemethere was talking about lmao! This is probably full of silly mistakes and I’d love to be corrected. Here we go!
First off, one of my goals is to build pytorch to run on macOS versions going back all the way to 10.14 so I’m using otool -l <some_binary>
to check which macOS version a binary at least requires to verify I’m on the right track. In the otool
output you would have to look for the LC_BUILD_VERSION
command and then its minos
field. For older binaries you have to look for the LC_VERSION_MIN_MACOSX
command and the version
field. Don’t we love Apple renaming stuff?
Checking all binaries in a directory by hand sucks, so here’s a script you can call like python3 otool.py <path>
(thanks ChatGPT).
#!/usr/bin/env python3
import argparse
import subprocess
import sys
from pathlib import Path
def parse_mach_o_versions(file_path: Path):
try:
output = subprocess.check_output(["otool", "-l", str(file_path)], stderr=subprocess.DEVNULL)
lines = output.decode("utf-8", errors="replace").splitlines()
except (subprocess.CalledProcessError, FileNotFoundError):
return None, None
minos = None
sdk = None
inside_version_min_macosx = False
inside_build_version = False
for line in lines:
line_stripped = line.strip()
if line_stripped.startswith("cmd LC_VERSION_MIN_MACOSX"):
inside_version_min_macosx = True
inside_build_version = False
continue
elif line_stripped.startswith("cmd LC_BUILD_VERSION"):
inside_build_version = True
inside_version_min_macosx = False
continue
if inside_version_min_macosx:
if line_stripped.startswith("version "):
_, ver_value = line_stripped.split(maxsplit=1)
minos = ver_value.strip()
elif line_stripped.startswith("sdk "):
_, sdk_value = line_stripped.split(maxsplit=1)
sdk = sdk_value.strip()
if minos and sdk:
break
if inside_build_version:
if line_stripped.startswith("minos "):
_, ver_value = line_stripped.split(maxsplit=1)
minos = ver_value.strip()
elif line_stripped.startswith("sdk "):
_, sdk_value = line_stripped.split(maxsplit=1)
sdk = sdk_value.strip()
if minos and sdk:
break
return minos, sdk
def main():
parser = argparse.ArgumentParser(
description="Recursively find Mach-O binaries and show their minimum macOS version and SDK."
)
parser.add_argument("path", type=str, help="Path to a file or directory.")
args = parser.parse_args()
root_path = Path(args.path).resolve()
if not root_path.exists():
print(f"Error: Path '{root_path}' does not exist.", file=sys.stderr)
sys.exit(1)
if root_path.is_dir():
for file_path in root_path.rglob("*"):
if file_path.is_file():
minos, sdk = parse_mach_o_versions(file_path)
if minos and sdk:
print(f"{minos}\t{sdk}\t{file_path.relative_to(root_path)}")
else:
if root_path.is_file():
minos, sdk = parse_mach_o_versions(root_path)
if minos and sdk:
print(f"{minos}\t{sdk}\t{root_path}")
else:
print(f"Error: '{root_path}' is not a file.", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
So let’s unzip torch-2.2.2-cp312-none-macosx_10_9_x86_64.whl
and run the script on its contents.
$ uv run otool.py torch-2.2.2-cp312-none-macosx_10_9_x86_64/
10.13 12.3 torch/_C.cpython-312-darwin.so
10.13 12.3 functorch/_C.cpython-312-darwin.so
10.13.6 10.14 functorch/.dylibs/libiomp5.dylib
10.13 12.3 torch/bin/protoc-3.13.0.0
10.13 12.3 torch/bin/torch_shm_manager
10.13 12.3 torch/bin/protoc
10.13 12.3 torch/lib/libtorch_python.dylib
10.13 12.3 torch/lib/libtorch.dylib
10.13 12.3 torch/lib/libtorch_global_deps.dylib
10.13.6 10.14 torch/lib/libiomp5.dylib
10.13 12.3 torch/lib/libtorch_cpu.dylib
10.13 12.3 torch/lib/libc10.dylib
10.13 12.3 torch/lib/libshm.dylib
Sus
So this macosx_10_9_x86_64
in the name of the wheel file was a lie? Dunno, did someone with macOS 10.9 (rofl) try to run this?
Anyway, let’s get going and try to build it ourselves. It’s kinda hard to get access to a real Intel Mac running macOS 10.14 so I’ll just switch to a Linux machine and use quickemu
to spin up a macOS 10.14 VM. The quickemu documentation in its repo is actually really easy to understand and helpful. The VM runs painfully slow but that doesn’t stop me from installing the Xcode Command Line Tools and cloning the pytorch repo.
xcode-select --install
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git checkout tags/v2.6.0
git submodule sync
git submodule update --init --recursive
I’m using Miniconda to setup a python 3.13 environment
conda create -y -n pytorch-build python=3.13
conda activate pytorch-build
conda install cmake ninja
pip install -r requirements.txt
pip install mkl-static mkl-include
Oh shit, there is no mkl-static
wheel for my platform? Indeed, checking all mkl-static
versions on PyPI they all seemingly require macOS 10.15. No way, let me check the wheels one by one. Cue the Cat Looks Inside meme. Gotcha mkl-static 2023.1.0
$ uv run otool.py mkl_static-2023.1.0-py2.py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64/
10.14 n/a mkl_static-2023.1.0.data/data/lib/libmkl_core.a
10.14 n/a mkl_static-2023.1.0.data/data/lib/libmkl_intel_lp64.a
10.14 n/a mkl_static-2023.1.0.data/data/lib/libmkl_intel_ilp64.a
Bruh. Intel what are you doing? But hey, I’m happy to have found a version that should work with macOS 10.14 (right?). So change the tag in the mkl_static-2023.1.0.dist-info/WHEEL
file of the unpacked wheel and re-pack it with a python script using from wheel.cli.pack import pack
. Same with mkl-include 2023.1.0
, intel-openmp 2023.1.0
and tbb 2021.10.0
which are also dependencies.
FYI
$ uv run otool.py tbb-2021.10.0-py2.py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64/
10.11 10.13 tbb-2021.10.0.data/data/lib/libtbbmalloc.2.10.dylib
10.11 10.13 tbb-2021.10.0.data/data/lib/libtbbmalloc_proxy.2.dylib
10.11 10.13 tbb-2021.10.0.data/data/lib/libtbbmalloc.dylib
10.11 10.13 tbb-2021.10.0.data/data/lib/libtbb.12.dylib
10.11 10.13 tbb-2021.10.0.data/data/lib/libtbbmalloc.2.dylib
10.11 10.13 tbb-2021.10.0.data/data/lib/libtbbmalloc_proxy.2.10.dylib
10.11 10.13 tbb-2021.10.0.data/data/lib/libtbbmalloc_proxy.dylib
10.11 10.13 tbb-2021.10.0.data/data/lib/libtbb.dylib
10.11 10.13 tbb-2021.10.0.data/data/lib/libtbb.12.10.dylib
$ uv run otool.py intel_openmp-2023.1.0-py2.py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64/
10.13.6 10.14 intel_openmp-2023.1.0.data/data/lib/libiomp5.dylib
10.13.6 10.14 intel_openmp-2023.1.0.data/data/lib/libiomp5_db.dylib
10.13.6 10.14 intel_openmp-2023.1.0.data/data/lib/libiompstubs5.dylib
So after having installed these, let’s get to build pytorch! Let’s set our environment variables
export MAX_JOBS="6"
export BUILD_TEST="0"
export BUILD_TYPE="Release"
export DEBUG="0"
export CFLAGS="-mmacosx-version-min=10.14"
export CXXFLAGS="-mmacosx-version-min=10.14"
export LDFLAGS="-mmacosx-version-min=10.14"
export MACOSX_DEPLOYMENT_TARGET="10.14"
export MACOS_DEPLOYMENT_TARGET="10.14"
export PYTORCH_BUILD_VERSION="2.6.0"
export PYTORCH_BUILD_NUMBER="1"
export USE_MPS="0"
export USE_CUDA="0"
export USE_MKL="1"
export USE_MKLDNN="1"
export USE_DISTRIBUTED="0"
export USE_NATIVE_ARCH="0"
export WERROR="1"
Not sure if all of these make sense tbh. I thought that using MPS on Intel Macs doesn’t make sense but when inspecting the last official pytorch build for Intel Macs I get this
>>> print(torch.__config__.show())
PyTorch built with:
- GCC 4.2
- C++ Version: 201703
- clang 13.1.6
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220801 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
- OpenMP 201811
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: NO AVX
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/Applications/Xcode_13.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_PYTORCH_METAL_EXPORT -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DUSE_COREML_DELEGATE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
which has -DUSE_MPS
so should it actually be enabled? Anyway. Let’s start the build
python setup.py bdist_wheel --plat-name=macosx_10_14_x86_64
Ah come on! The Clang provided by Apple for macOS 10.14 does not support some AVX512 related compiler flag
[321/7319] Building C object confu-deps/XNNPACK/CMakeFiles/microkerne...od.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o
FAILED: confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o
/Library/Developer/CommandLineTools/usr/bin/cc -DFXDIV_USE_INLINE_ASSEMBLY=0 -DXNN_ENABLE_ARM_BF16=0 -DXNN_ENABLE_ARM_DOTPROD=0 -DXNN_ENABLE_ARM_FP16_SCALAR=0 -DXNN_ENABLE_ARM_FP16_VECTOR=0 -DXNN_ENABLE_ARM_I8MM=0 -DXNN_ENABLE_ARM_SME2=0 -DXNN_ENABLE_ARM_SME=0 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_AVX256SKX=1 -DXNN_ENABLE_AVX256VNNI=1 -DXNN_ENABLE_AVX256VNNIGFNI=1 -DXNN_ENABLE_AVX512AMX=1 -DXNN_ENABLE_AVX512F=1 -DXNN_ENABLE_AVX512FP16=1 -DXNN_ENABLE_AVX512SKX=1 -DXNN_ENABLE_AVX512VBMI=1 -DXNN_ENABLE_AVX512VNNI=1 -DXNN_ENABLE_AVX512VNNIGFNI=1 -DXNN_ENABLE_AVXVNNI=0 -DXNN_ENABLE_AVXVNNIINT8=0 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_HVX=1 -DXNN_ENABLE_KLEIDIAI=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -DXNN_ENABLE_VSX=1 -I/Users/user/Documents/pytorch/third_party/XNNPACK/include -I/Users/user/Documents/pytorch/third_party/XNNPACK/src -I/Users/user/Documents/pytorch/third_party/pthreadpool/include -I/Users/user/Documents/pytorch/third_party/FXdiv/include -isystem /Users/user/Documents/pytorch/third_party/protobuf/src -mmacosx-version-min=10.14 -O3 -DNDEBUG -std=c99 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk -mmacosx-version-min=10.14 -fPIC -O2 -mf16c -mfma -mavx512f -mavx512cd -mavx512bw -mavx512dq -mavx512vl -mavx512vnni -mgfni -mavx512fp16 -MD -MT confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o -MF confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o.d -o confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c.o -c /Users/user/Documents/pytorch/third_party/XNNPACK/src/f16-gemm/gen/f16-gemm-1x64-minmax-avx512fp16-broadcast.c
clang: error: unknown argument: '-mavx512fp16'
No problem, let’s install MacPorts
and get clang-17
. Now set some environment variables to use the new clang
export CC=/opt/local/bin/clang
export CXX=/opt/local/bin/clang++
export CXX_COMPILER="/opt/local/bin/clang++"
and try again! Don’t forget to clean your build directory.
Oh well, actually the file build/CMakeFiles/microkernels-all.rsp
exists so no idea what’s the issue here.
[4657/7319] Linking C static library lib/libmicrokernels-all.a
FAILED: lib/libmicrokernels-all.a
: && /Users/user/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/cmake/data/bin/cmake -E rm -f lib/libmicrokernels-all.a && /opt/local/bin/ar qc lib/libmicrokernels-all.a @CMakeFiles/microkernels-all.rsp && /opt/local/bin/ranlib lib/libmicrokernels-all.a && /Users/user/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/cmake/data/bin/cmake -E touch lib/libmicrokernels-all.a && :
ar: @CMakeFiles/microkernels-all.rsp: No such file or directory
Fuck it. Doing this in a macOS 10.14 VM feels like a dead end. Why not just use Apple Rosetta and compile pytorch for x86_64 macOS from my M-series MacBook. We can actually get a x86_64 conda environment like this
conda create --platform osx-64 --name pytorch-build-x86_64 python=3.13
and do the whole setup again. Don’t forget to run everything with Rosetta. What could possibly go wrong? Let’s go!
arch -x86_64 python setup.py bdist_wheel --plat-name=macosx_10_14_x86_64
Sigh… Okay so there’s actually already quite a bit of stuff that just isn’t supported by the macOS 10.14 SDK. You’ll get errors like this
[4925/7319] Building CXX object third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o
FAILED: third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -D__STDC_FORMAT_MACROS -I/Users/user/Documents/Repos/pytorch/cmake/../third_party/benchmark/include -I/Users/user/Documents/Repos/pytorch/third_party/onnx -I/Users/user/Documents/Repos/pytorch/build/third_party/onnx -isystem /Users/user/Documents/Repos/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/user/Documents/Repos/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/user/Documents/Repos/pytorch/third_party/protobuf/src -isystem /Users/user/Documents/Repos/pytorch/third_party/XNNPACK/include -isystem /Users/user/Documents/Repos/pytorch/third_party/ittapi/include -isystem /Users/user/Documents/Repos/pytorch/cmake/../third_party/eigen -mmacosx-version-min=10.14 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -Wnon-virtual-dtor -O3 -DNDEBUG -std=gnu++17 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.0.sdk -mmacosx-version-min=10.14 -fPIC -MD -MT third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o -MF third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o.d -o third_party/onnx/CMakeFiles/onnx.dir/onnx/checker.cc.o -c /Users/user/Documents/Repos/pytorch/third_party/onnx/onnx/checker.cc
In file included from /Users/user/Documents/Repos/pytorch/third_party/onnx/onnx/checker.cc:15:
/Users/user/Documents/Repos/pytorch/third_party/onnx/onnx/common/file_utils.h:20:20: error: 'path' is unavailable: introduced in macOS 10.15
20 | std::filesystem::path proto_u8_path = std::filesystem::u8path(proto_path);
| ^
and that
In file included from /Users/user/Documents/Repos/pytorch/aten/src/ATen/native/mkl/SpectralOps.cpp:207:
/Users/user/Documents/Repos/pytorch/third_party/pocketfft/pocketfft_hdronly.h:159:15: error: 'aligned_alloc' is only available on macOS 10.15 or newer [-Werror,-Wunguarded-availability-new]
159 | void *ptr = ::aligned_alloc(align,(size+align-1)&(~(align-1)));
| ^~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.0.sdk/usr/include/malloc/_malloc.h:65:35: note: 'aligned_alloc' has been marked as being introduced in macOS 10.15 here, but the deployment target is macOS 10.14.0
65 | void * __sized_by_or_null(__size) aligned_alloc(size_t __alignment, size_t __size) __result_use_check __alloc_align(1) __alloc_size(2) _MALLOC_TYPED(malloc_type_aligned_alloc, 2) __OSX_AVAILABLE(10.15) __IOS_AVAILABLE(13.0) __TVOS_AVAILABLE(13.0) __WATCHOS_AVAILABLE(6.0);
| ^
/Users/user/Documents/Repos/pytorch/third_party/pocketfft/pocketfft_hdronly.h:159:15: note: enclose 'aligned_alloc' in a __builtin_available check to silence this warning
159 | void *ptr = ::aligned_alloc(align,(size+align-1)&(~(align-1)));
| ^~~~~~~~~~~~~~~
160 | if (!ptr) throw std::bad_alloc();
161 | return ptr;
|
While one could patch third_party/onnx/onnx/common/file_utils.h
to support macOS 10.14 (by throwing out unicode support for paths lul) I surely won’t touch that aligned_alloc
in aten/src/ATen/native/mkl/SpectralOps
. Or would that be feasible? I don’t think that diverging from upstream is a good idea anyway. So yeah, fuck. I don’t think we’ll ever see a latest version of pytorch for macOS 10.14 again. Let’s raise the deployment target to macOS 10.15. I’m sure that not all of these are needed but whatever.
export CFLAGS="-mmacosx-version-min=10.15"
export CXXFLAGS="-mmacosx-version-min=10.15"
export LDFLAGS="-mmacosx-version-min=10.15"
export MACOSX_DEPLOYMENT_TARGET="10.15"
export MACOS_DEPLOYMENT_TARGET="10.15"
Ready to be disappointed again, let’s run
arch -x86_64 python setup.py bdist_wheel --plat-name=macosx_10_15_x86_64
Wow, that actually worked
Checking the compiled tests doesn’t look too bad. Or does it?
$ cd build
$ arch -x86_64 ctest
...
97% tests passed, 3 tests failed out of 112
Total Test time (real) = 102.66 sec
The following tests FAILED:
48 - vec_test_all_types_AVX512 (ILLEGAL)
49 - vec_test_all_types_AVX2 (ILLEGAL)
80 - scalar_tensor_test (Failed)
Errors while running CTest
Output from these tests are in: /Users/user/Documents/Repos/pytorch/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
$ arch -x86_64 ctest --rerun-failed --output-on-failure
Test project /Users/user/Documents/Repos/pytorch/build
Start 48: vec_test_all_types_AVX512
1/3 Test #48: vec_test_all_types_AVX512 ........***Exception: Illegal 0.03 sec
Start 49: vec_test_all_types_AVX2
2/3 Test #49: vec_test_all_types_AVX2 ..........***Exception: Illegal 0.48 sec
Running main() from /Users/user/Documents/Repos/pytorch/third_party/googletest/googletest/src/gtest_main.cc
...
Start 80: scalar_tensor_test
3/3 Test #80: scalar_tensor_test ...............***Failed 0.65 sec
Running main() from /Users/user/Documents/Repos/pytorch/third_party/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from TestScalarTensor
[ RUN ] TestScalarTensor.TestScalarTensorCPU
[ OK ] TestScalarTensor.TestScalarTensorCPU (58 ms)
[ RUN ] TestScalarTensor.TestScalarTensorCUDA
[ OK ] TestScalarTensor.TestScalarTensorCUDA (0 ms)
[ RUN ] TestScalarTensor.TestScalarTensorMPS
/Users/user/Documents/Repos/pytorch/aten/src/ATen/test/scalar_tensor_test.cpp:229: Failure
Expected equality of these values:
lhs.numel()
Which is: 1
0
[ FAILED ] TestScalarTensor.TestScalarTensorMPS (178 ms)
[----------] 3 tests from TestScalarTensor (236 ms total)
[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (237 ms total)
[ PASSED ] 2 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] TestScalarTensor.TestScalarTensorMPS
1 FAILED TEST
0% tests passed, 3 tests failed out of 3
Total Test time (real) = 1.19 sec
The following tests FAILED:
48 - vec_test_all_types_AVX512 (ILLEGAL)
49 - vec_test_all_types_AVX2 (ILLEGAL)
80 - scalar_tensor_test (Failed)
Errors while running CTest
Eh, I don’t mind test 48 and 49 but 80 worries me a bit. What do y’all think?
When installing the wheel from dist
and importing torch I check the config output
>>> print(torch.__config__.show())
PyTorch built with:
- GCC 4.2
- C++ Version: 201703
- clang 16.0.0
- Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: NO AVX
- Build settings: BLAS_INFO=accelerate, BUILD_TYPE=Release, COMMIT_SHA=1eba9b3aa3c43f86f4a2c807ac8e12c4a7767340, CXX_COMPILER=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=old-style-cast -Wconstant-conversion -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces, LAPACK_INFO=accelerate, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.6.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=OFF, USE_MKL=OFF, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=OFF, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
Not exactly what I expected, why is USE_MKL=OFF
and USE_OPENMP=OFF
? What do you think, anything else looking wrong?
But hey, it’s a start. Let’s run the python tests
pip install -r .ci/docker/requirements-ci.txt
python test/run_test.py --verbose --keep-going
This one is taking so much time I dunno. I’m seeing failures like stuff with MPS but not sure how much of this is really informative right now. Is there a better way to test whether I compiled pytorch properly? All in all this feels very messy and I’m not too confident I did everything right here. Inspecting the binary files in the resulting wheel also gives strange results
$ uv run python otool.py <path_to_unpacked_wheel>
10.15 15.0 torch/_C.cpython-313-darwin.so
10.15 15.0 functorch/_C.cpython-313-darwin.so
15.0 15.0 torch/test/c10_ConstexprCrc_test
15.0 15.0 torch/test/dispatch_key_set_test
15.0 15.0 torch/test/type_test
15.0 15.0 torch/test/cpu_allocator_test
15.0 15.0 torch/test/weakref_test
15.0 15.0 torch/test/c10_string_view_test
15.0 15.0 torch/test/c10_exception_test
15.0 15.0 torch/test/packedtensoraccessor_test
15.0 15.0 torch/test/quantized_test
15.0 15.0 torch/test/c10_small_vector_test
15.0 15.0 torch/test/type_ptr_test
15.0 15.0 torch/test/c10_error_test
15.0 15.0 torch/test/c10_SizesAndStrides_test
15.0 15.0 torch/test/scalar_test
15.0 15.0 torch/test/c10_ordered_preserving_dict_test
15.0 15.0 torch/test/math_kernel_test
15.0 15.0 torch/test/kernel_stackbased_test
15.0 15.0 torch/test/c10_Metaprogramming_test
15.0 15.0 torch/test/MaybeOwned_test
15.0 15.0 torch/test/c10_ArrayRef_test
15.0 15.0 torch/test/operator_name_test
15.0 15.0 torch/test/c10_Synchronized_test
15.0 15.0 torch/test/inline_container_test
15.0 15.0 torch/test/c10_ssize_test
15.0 15.0 torch/test/c10_cow_test
15.0 15.0 torch/test/Dimname_test
15.0 15.0 torch/test/cpu_rng_test
15.0 15.0 torch/test/kernel_lambda_legacy_test
15.0 15.0 torch/test/mps_test_objc_interface
15.0 15.0 torch/test/mps_test_allocator
15.0 15.0 torch/test/dlconvertor_test
15.0 15.0 torch/test/cpu_profiling_allocator_test
15.0 15.0 torch/test/c10_intrusive_ptr_test
15.0 15.0 torch/test/pow_test
15.0 15.0 torch/test/c10_DispatchKeySet_test
15.0 15.0 torch/test/c10_NetworkFlow_test
15.0 15.0 torch/test/backend_fallback_test
15.0 15.0 torch/test/c10_InlineStreamGuard_test
15.0 15.0 torch/test/c10_optional_test
15.0 15.0 torch/test/c10_TypeIndex_test
15.0 15.0 torch/test/undefined_tensor_test
15.0 15.0 torch/test/basic
15.0 15.0 torch/test/List_test
15.0 15.0 torch/test/c10_SymInt_test
15.0 15.0 torch/test/c10_intrusive_ptr_benchmark
15.0 15.0 torch/test/extension_backend_test
15.0 15.0 torch/test/c10_Bitset_test
15.0 15.0 torch/test/thread_init_test
15.0 15.0 torch/test/kernel_function_legacy_test
15.0 15.0 torch/test/apply_utils_test
15.0 15.0 torch/test/make_boxed_from_unboxed_functor_test
15.0 15.0 torch/test/c10_Scalar_test
15.0 15.0 torch/test/legacy_vmap_test
15.0 15.0 torch/test/c10_DeviceGuard_test
15.0 15.0 torch/test/CppSignature_test
15.0 15.0 torch/test/reportMemoryUsage_test
15.0 15.0 torch/test/lazy_tensor_test
15.0 15.0 torch/test/mps_test_metal_library
15.0 15.0 torch/test/c10_string_util_test
15.0 15.0 torch/test/reduce_ops_test
15.0 15.0 torch/test/stride_properties_test
15.0 15.0 torch/test/c10_StreamGuard_test
15.0 15.0 torch/test/IListRef_test
15.0 15.0 torch/test/NamedTensor_test
15.0 15.0 torch/test/verify_api_visibility
15.0 15.0 torch/test/test_parallel
15.0 15.0 torch/test/operators_test
15.0 15.0 torch/test/op_allowlist_test
15.0 15.0 torch/test/c10_bit_cast_test
15.0 15.0 torch/test/mps_test_print
15.0 15.0 torch/test/scalar_tensor_test
15.0 15.0 torch/test/c10_Half_test
15.0 15.0 torch/test/c10_registry_test
15.0 15.0 torch/test/xla_tensor_test
15.0 15.0 torch/test/half_test
15.0 15.0 torch/test/c10_complex_math_test
15.0 15.0 torch/test/c10_DeadlockDetection_test
15.0 15.0 torch/test/c10_accumulate_test
15.0 15.0 torch/test/c10_ThreadLocal_test
15.0 15.0 torch/test/native_test
15.0 15.0 torch/test/c10_TypeList_test
15.0 15.0 torch/test/c10_bfloat16_test
15.0 15.0 torch/test/c10_InlineDeviceGuard_test
15.0 15.0 torch/test/wrapdim_test
15.0 15.0 torch/test/op_registration_test
15.0 15.0 torch/test/c10_lazy_test
15.0 15.0 torch/test/atest
15.0 15.0 torch/test/c10_generic_math_test
15.0 15.0 torch/test/kernel_function_test
15.0 15.0 torch/test/kernel_lambda_test
15.0 15.0 torch/test/memory_overlapping_test
15.0 15.0 torch/test/Dict_test
15.0 15.0 torch/test/c10_irange_test
15.0 15.0 torch/test/mobile_memory_cleanup
15.0 15.0 torch/test/c10_tempfile_test
15.0 15.0 torch/test/c10_CompileTimeFunctionPointer_test
15.0 15.0 torch/test/StorageUtils_test
15.0 15.0 torch/test/c10_Device_test
15.0 15.0 torch/test/broadcast_test
15.0 15.0 torch/test/c10_LeftRight_test
15.0 15.0 torch/test/ivalue_test
15.0 15.0 torch/test/c10_flags_test
15.0 15.0 torch/test/c10_TypeTraits_test
15.0 15.0 torch/test/KernelFunction_test
15.0 15.0 torch/test/memory_format_test
15.0 15.0 torch/test/c10_logging_test
15.0 15.0 torch/test/tensor_iterator_test
15.0 15.0 torch/test/cpu_generator_test
15.0 15.0 torch/test/c10_typeid_test
15.0 15.0 torch/test/c10_complex_test
15.0 15.0 torch/bin/test_tensorexpr
15.0 15.0 torch/bin/test_lazy
10.15 15.0 torch/bin/protoc-3.13.0.0
10.15 15.0 torch/bin/torch_shm_manager
15.0 15.0 torch/bin/tutorial_tensorexpr
15.0 15.0 torch/bin/test_edge_op_registration
15.0 15.0 torch/bin/test_jit
15.0 15.0 torch/bin/test_api
10.15 15.0 torch/bin/protoc
10.15 15.0 torch/lib/libtorch_python.dylib
15.0 15.0 torch/lib/libbackend_with_compiler.dylib
10.15 15.0 torch/lib/libtorch.dylib
10.15 15.0 torch/lib/libtorch_global_deps.dylib
10.15 15.0 torch/lib/libtorch_cpu.dylib
15.0 15.0 torch/lib/libjitbackend_test.dylib
10.15 15.0 torch/lib/libc10.dylib
15.0 15.0 torch/lib/libtorchbind_test.dylib
10.15 15.0 torch/lib/libshm.dylib
15.0 15.0 torch/lib/libaoti_custom_ops.dylib
Lots of test stuff included which I thought I excluded with export BUILD_TEST="0"
and also plenty of binaries that require at least macOS 15 which is the version I’m currently on. Bummer. How can I avoid including all the test binaries? Why is it so hard to have all binaries require the minimum macOS version I specified?
So I’m very interested in hearing your suggestions, advice and opinions. Feel free to ask if you have questions about what I did here. Looking forward to get this community effort started 