I’m trying to build PyTorch from source (cloned 7a79de1) on a fresh Windows 11 system (Ryzen 7 5800XT, 32GB DDR4-2666 RAM) with an NVIDIA RTX 5060 Ti 16GB and CUDA 12.9. Building from source is a critical requirement as existing PyTorch builds don’t yet support this GPU.
My build consistently fails with an “LLVM ERROR: out of memory” and nvcc error ('""%CICC_PATH%\cicc"' died with status 0xC0000409) when compiling aten\src\ATen\native\cuda\SegmentReduce.cu. The log shows this is specifically during the object file generation for this .cu file.
I’ve already taken several troubleshooting steps:
- Increased Windows Virtual Memory (Page File): Set to Initial: 49152 MB (48GB) & Maximum: 65536 MB (64GB). PC was restarted.
- Limited Build Jobs: Ran
set MAX_JOBS=1beforepython setup.py install. - Clean Build: Deleted the
builddirectory before each attempt. - Environment: Using Visual Studio 2022 Community (MSVC 19.44.35208.0) Developer Command Prompt, Python 3.10.6 in a virtual environment (
sd_env), and all Python build dependencies installed. - System Checks: Windows Memory Diagnostic reported no errors. Event Viewer showed no relevant errors during previous failures.
The build log (partial paste below) clearly shows the failure at SegmentReduce.cu. It seems the compiler (or related tools) is hitting a memory limit even with substantial virtual memory and single-threaded compilation.
[3400/7521] Building CUDA object caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cuda\SegmentReduce.cu.obj [cite: 1983] FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SegmentReduce.cu.obj [cite: 1983] C:\PROGRA~1\NVIDIA~2\CUDA\v12.9\bin\nvcc.exe -forward-unknown-to-host-compiler ... -x cu -c C:\StableDiffusion\pytorch\aten\src\ATen\native\cuda\SegmentReduce.cu -o caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cuda\SegmentReduce.cu.obj -Xcompiler=-Fdcaffe2\CMakeFiles\torch_cuda.dir\,-FS [cite: 1983, 1987] LLVM ERROR: out of memory [cite: 1988] SegmentReduce.cu [cite: 1988] nvcc error : '""%CICC_PATH%\cicc"' died with status 0xC0000409 [cite: 1988] ninja: build stopped: subcommand failed. [cite: 1988]
Could this be a compatibility issue between CUDA 12.9 and the RTX 5060 Ti’s specific architecture during this particular compilation step? Or is there another compiler flag/environmental variable that might help with memory management during nvcc’s LLVM backend process?