MPS Back End Out of Memory on GitHub Action

sammlapp · October 11, 2023, 8:24pm

When running a pytest action on GitHub Actions Mac OS, I inconsistently get an error message:

RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

from a line running a_tensor.to('mps')

If I keep re-running the test it eventually succeeds, but sometimes it takes >5 attempts. I don’t understand why the error is occurring or how to prevent it. Setting the environment variable PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 as suggested in the error message does not resolve the problem. It occurs, at least occasionally, with Python versions 3.9, 3.10, and 3.11.

The issue does not occur when I run pytest locally on Mac OS with Apple Silicon.

The github action lists the following operating system information:

macOS

12.6.9

21G726

The installed pytorch is torch (2.0.0)

thorinf · October 30, 2023, 3:22pm

Hi,

Did you find any solution to this? I have the exact same issue with GitHub workflows, but all tests pass on my local macOS.

Thanks

gernophil · November 1, 2023, 11:14am

Having the same issue (see here and here), but only since 2.1.0.

sammlapp · November 2, 2023, 5:24pm

I don’t have a solution

janosh · February 28, 2024, 9:33am

even GH’s new macos-14 runners don’t have access to MPS hardware unfortunately, see pytorch/pytorch#111449 (comment).

michaelfeil · June 9, 2024, 2:38am

I used to be able to run. Since ~May, the memory seems to be more limited (or a having other issues)

RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 44.71 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

ethanwhite · July 12, 2024, 5:25pm

Switching runs-on from macos-latest (which is running on M1) to macos-13 fixed this issue for me. Hopefully they’ll get the issue with the M1 architecture fixed, but this is working for us for now.