I’m trying to build PyTorch from source (cloned 7a79de1
) on a fresh Windows 11 system (Ryzen 7 5800XT, 32GB DDR4-2666 RAM) with an NVIDIA RTX 5060 Ti 16GB and CUDA 12.9. Building from source is a critical requirement as existing PyTorch builds don’t yet support this GPU.
My build consistently fails with an “LLVM ERROR: out of memory” and nvcc
error ('""%CICC_PATH%\cicc"' died with status 0xC0000409
) when compiling aten\src\ATen\native\cuda\SegmentReduce.cu
. The log shows this is specifically during the object file generation for this .cu
file.
I’ve already taken several troubleshooting steps:
- Increased Windows Virtual Memory (Page File): Set to Initial: 49152 MB (48GB) & Maximum: 65536 MB (64GB). PC was restarted.
- Limited Build Jobs: Ran
set MAX_JOBS=1
beforepython setup.py install
. - Clean Build: Deleted the
build
directory before each attempt. - Environment: Using Visual Studio 2022 Community (MSVC 19.44.35208.0) Developer Command Prompt, Python 3.10.6 in a virtual environment (
sd_env
), and all Python build dependencies installed. - System Checks: Windows Memory Diagnostic reported no errors. Event Viewer showed no relevant errors during previous failures.
The build log (partial paste below) clearly shows the failure at SegmentReduce.cu
. It seems the compiler (or related tools) is hitting a memory limit even with substantial virtual memory and single-threaded compilation.
[3400/7521] Building CUDA object caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cuda\SegmentReduce.cu.obj [cite: 1983] FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SegmentReduce.cu.obj [cite: 1983] C:\PROGRA~1\NVIDIA~2\CUDA\v12.9\bin\nvcc.exe -forward-unknown-to-host-compiler ... -x cu -c C:\StableDiffusion\pytorch\aten\src\ATen\native\cuda\SegmentReduce.cu -o caffe2\CMakeFiles\torch_cuda.dir\__\aten\src\ATen\native\cuda\SegmentReduce.cu.obj -Xcompiler=-Fdcaffe2\CMakeFiles\torch_cuda.dir\,-FS [cite: 1983, 1987] LLVM ERROR: out of memory [cite: 1988] SegmentReduce.cu [cite: 1988] nvcc error : '""%CICC_PATH%\cicc"' died with status 0xC0000409 [cite: 1988] ninja: build stopped: subcommand failed. [cite: 1988]
Could this be a compatibility issue between CUDA 12.9 and the RTX 5060 Ti’s specific architecture during this particular compilation step? Or is there another compiler flag/environmental variable that might help with memory management during nvcc
’s LLVM backend process?