Build and runtime issues on MacOS Sonoma Beta

Has anyone had success building on the MacOS Sonoma Beta? I’m using Beta 2 on two my devices and have experienced a few issues:

  • Build hang when building PyTorch from source w/ Xcode 15 Beta 2 - clang seems to go into an infinite loop during compiling & linking aten/src/ATen/native/cpu/ReduceOpsKernel.cpp.DEFAULT.cpp. I haven’t had the opportunity yet to disable SIP and use DTrace to try and figure out what is going on. I’ve also been able to reproduce this on Ventura using Xcode 15 Beta 2, so it seems to be Xcode specific. My current theory is it is related to the new linker, but that is just my speculation.

  • PyTorch 2.0.1 and nightly builds from the PyTorch conda channel crash on my MacBook Pro M2 with a bus error. I was running a few different experiments and did not have the opportunity to investigate further, yet. They run fine on my MacBook Pro M1 with Sonoma Beta 2 installed, so this seems to be M2 specific.

Has anyone else experienced these issues or had more success?

Thanks

1 Like

clang does not hang when building aten/src/ATen/native/cpu/ReduceOpsKernel.cpp.DEFAULT.cpp if using CMAKE_BUILD_TYPE=Debug, so this seems to imply it is related to clang optimization in Xcode 15 Beta 2. Also, verified it occurs in Xcode 15 Beta 1.

Was able to reduce the bus error on Sonoma by running python test/test_mps.py on HEAD:

...
loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/fe2afe83-06e7-11ee-80c3-f6357a1003e8/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x5x1x5xi1>'
loc("mps_select"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/fe2afe83-06e7-11ee-80c3-f6357a1003e8/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":294:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<5x10x1x5xi1>'
...../Users/dlewis/miniconda3/envs/pytorchnightly/lib/python3.11/site-packages/torch/autograd/__init__.py:319: UserWarning: An output with one or more elements was resized since it had shape [], which does not match the required output shape [1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /Users/dlewis/work/repos/third-party/pytorch/pytorch/aten/src/ATen/native/Resize.cpp:35.)
  result = Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
...........x.........................xx..x...x.xxxxxx....xx..xx...xx.....ssssxxxxxxxxxxx..xx..xx......./Users/dlewis/miniconda3/envs/pytorchnightly/lib/python3.11/site-packages/torch/testing/_internal/opinfo/core.py:1107: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at /Users/dlewis/work/repos/third-party/pytorch/pytorch/aten/src/ATen/native/Convolution.cpp:1012.)
  return self.op(*args, **kwargs)
...x....zsh: bus error  python test/test_mps.py

The bus error only happens on my MacBook Pro M2 and not my M1, so M2/Metal-specific on Sonoma?

I also ran into this issue. Mac M2 with Sonoma Release 14.0
Torch crashes on mps-device during backward pass and/or loss calculation.
Any progress on this one?

Hi, I am having the same issue. My code was working fine before the update. Is there any update to the reason or how to fix the problem?