Pytorch build fron source hangs with a few errors


[4742/6394] Generating …/…/…/include/sleef.h
Generating sleef.h: mkrename cinz_ 2 4 __m128d __m128 __m128i _m128i SSE2
Generating sleef.h: mkrename cinz
2 4 __m128d __m128 __m128i _m128i SSE2 sse2
Generating sleef.h: mkrename cinz
2 4 __m128d __m128 __m128i _m128i SSE2 sse4
Generating sleef.h: mkrename cinz
4 8 __m256d __m256 __m128i struct\ {\ _m128i\ x,\ y;\ } AVX
Generating sleef.h: mkrename cinz
4 8 __m256d __m256 __m128i struct\ {\ _m128i\ x,\ y;\ } AVX avx
Generating sleef.h: mkrename finz
4 8 __m256d __m256 __m128i struct\ {\ _m128i\ x,\ y;\ } AVX fma4
Generating sleef.h: mkrename finz
4 8 __m256d __m256 __m128i _m256i AVX avx2
Generating sleef.h: mkrename finz
2 4 __m128d __m128 __m128i _m128i SSE2 avx2128
Generating sleef.h: mkrename finz
8 16 __m512d __m512 __m256i _m512i AVX512F
Generating sleef.h: mkrename finz
8 16 _m512d m512 m256i m512i AVX512F avx512f
Generating sleef.h: mkrename cinz
8 16 m512d m512 m256i m512i AVX512F avx512fnofma
Generating sleef.h: mkrename cinz
1 1 double float int32_t int32_t STDC purec
Generating sleef.h: mkrename finz
1 1 double float int32_t int32_t FP_FAST_FMA purecfma
[4969/6394] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/
/aten/src/ATen/native/RNN.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/
/aten/src/ATen/native/RNN.cpp.o
/usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DBUILD_ONEDNN_GRAPH -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/output/workspace/pytorch/build/aten/src -I/output/workspace/pytorch/aten/src -I/output/workspace/pytorch/build -I/output/workspace/pytorch -I/output/workspace/pytorch/cmake/…/third_party/cudnn_frontend/include -I/output/workspace/pytorch/third_party/onnx -I/output/workspace/pytorch/build/third_party/onnx -I/output/workspace/pytorch/third_party/foxi -I/output/workspace/pytorch/build/third_party/foxi -I/output/workspace/pytorch/torch/csrc/api -I/output/workspace/pytorch/torch/csrc/api/include -I/output/workspace/pytorch/caffe2/aten/src/TH -I/output/workspace/pytorch/build/caffe2/aten/src/TH -I/output/workspace/pytorch/build/caffe2/aten/src -I/output/workspace/pytorch/build/caffe2/…/aten/src -I/output/workspace/pytorch/torch/csrc -I/output/workspace/pytorch/third_party/miniz-2.1.0 -I/output/workspace/pytorch/third_party/kineto/libkineto/include -I/output/workspace/pytorch/third_party/kineto/libkineto/src -I/output/workspace/pytorch/aten/…/third_party/catch/single_include -I/output/workspace/pytorch/aten/src/ATen/… -I/output/workspace/pytorch/third_party/FXdiv/include -I/output/workspace/pytorch/c10/… -I/output/workspace/pytorch/third_party/pthreadpool/include -I/output/workspace/pytorch/third_party/cpuinfo/include -I/output/workspace/pytorch/third_party/QNNPACK/include -I/output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/output/workspace/pytorch/third_party/cpuinfo/deps/clog/include -I/output/workspace/pytorch/third_party/NNPACK/include -I/output/workspace/pytorch/third_party/fbgemm/include -I/output/workspace/pytorch/third_party/fbgemm -I/output/workspace/pytorch/third_party/fbgemm/third_party/asmjit/src -I/output/workspace/pytorch/third_party/ittapi/src/ittnotify -I/output/workspace/pytorch/third_party/FP16/include -I/output/workspace/pytorch/third_party/tensorpipe -I/output/workspace/pytorch/build/third_party/tensorpipe -I/output/workspace/pytorch/third_party/tensorpipe/third_party/libnop/include -I/output/workspace/pytorch/third_party/fmt/include -I/output/workspace/pytorch/build/third_party/ideep/mkl-dnn/third_party/oneDNN/include -I/output/workspace/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/src/…/include -I/output/workspace/pytorch/third_party/flatbuffers/include -isystem /output/workspace/pytorch/build/third_party/gloo -isystem /output/workspace/pytorch/cmake/…/third_party/gloo -isystem /output/workspace/pytorch/third_party/protobuf/src -isystem /opt/conda/include -isystem /output/workspace/pytorch/third_party/gemmlowp -isystem /output/workspace/pytorch/third_party/neon2sse -isystem /output/workspace/pytorch/third_party/XNNPACK/include -isystem /output/workspace/pytorch/third_party/ittapi/include -isystem /output/workspace/pytorch/cmake/…/third_party/eigen -isystem /opt/hpcx/ompi/include -isystem /opt/hpcx/ompi/include/openmpi -isystem /opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem /opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/local/cuda/include -isystem /output/workspace/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem /output/workspace/pytorch/third_party/ideep/include -isystem /output/workspace/pytorch/third_party/ideep/mkl-dnn/include -isystem /output/workspace/pytorch/build/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -DHAVE_AVX2_CPU_DEFINITION -g -fno-omit-frame-pointer -O0 -fPIC -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-sign-compare -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -fopenmp -DCAFFE2_BUILD_MAIN_LIB -pthread -DASMJIT_STATIC -std=gnu++14 -Wno-deprecated-declarations -MD -MT caffe2/CMakeFiles/torch_cpu.dir/
/aten/src/ATen/native/RNN.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/
/aten/src/ATen/native/RNN.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/
/aten/src/ATen/native/RNN.cpp.o -c /output/workspace/pytorch/aten/src/ATen/native/RNN.cpp
In file included from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h:8,
from /output/workspace/pytorch/aten/src/ATen/native/RNN.cpp:8:
/output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h: In function ‘xnn_status at::native::xnnp_utils::xnnp_create_convolution2d_nhwc(uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, size_t, size_t, size_t, size_t, int8_t, float, int8_t, const float*, const int8_t*, const int32_t*, int8_t, float, int8_t, int8_t, uint32_t, xnn_operator**, bool, bool)’:
/output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h:102:9: error: cannot convert ‘xnn_operator**’ to ‘xnn_caches_t’ {aka ‘const xnn_caches*’}
102 | op); /* xnn_operator_t* deconvolution_op_out /
| ^~
| |
| xnn_operator
*
In file included from /output/workspace/pytorch/aten/src/ATen/native/xnnpack/Common.h:7,
from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h:7,
from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h:8,
from /output/workspace/pytorch/aten/src/ATen/native/RNN.cpp:8:
/output/workspace/pytorch/third_party/XNNPACK/include/xnnpack.h:3479:16: note: initializing argument 26 of ‘xnn_status xnn_create_deconvolution2d_nhwc_qs8(uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, size_t, size_t, size_t, size_t, int8_t, float, float, const int8_t*, const int32_t*, int8_t, float, int8_t, int8_t, uint32_t, xnn_caches_t, xnn_operator**)’
3479 | xnn_caches_t caches,
| ~~~~~^~
In file included from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h:8,
from /output/workspace/pytorch/aten/src/ATen/native/RNN.cpp:8:
/output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h:133:9: error: cannot convert ‘xnn_operator**’ to ‘xnn_caches_t’ {aka ‘const xnn_caches*’}
133 | op); /* xnn_operator_t* convolution_op_out /
| ^~
| |
| xnn_operator
*
In file included from /output/workspace/pytorch/aten/src/ATen/native/xnnpack/Common.h:7,
from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h:7,
from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h:8,
from /output/workspace/pytorch/aten/src/ATen/native/RNN.cpp:8:
/output/workspace/pytorch/third_party/XNNPACK/include/xnnpack.h:3441:16: note: initializing argument 26 of ‘xnn_status xnn_create_convolution2d_nhwc_qs8(uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, size_t, size_t, size_t, size_t, int8_t, float, float, const int8_t*, const int32_t*, int8_t, float, int8_t, int8_t, uint32_t, xnn_caches_t, xnn_operator**)’
3441 | xnn_caches_t caches,
| ~^~
In file included from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h:8,
from /output/workspace/pytorch/aten/src/ATen/native/RNN.cpp:8:
/output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h:161:9: error: cannot convert ‘xnn_operator**’ to ‘xnn_caches_t’ {aka ‘const xnn_caches*’}
161 | op); /* xnn_operator_t* convolution_op_out /
| ^~
| |
| xnn_operator
*
In file included from /output/workspace/pytorch/aten/src/ATen/native/xnnpack/Common.h:7,
from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h:7,
from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h:8,
from /output/workspace/pytorch/aten/src/ATen/native/RNN.cpp:8:
/output/workspace/pytorch/third_party/XNNPACK/include/xnnpack.h:3357:16: note: initializing argument 26 of ‘xnn_status xnn_create_convolution2d_nhwc_qc8(uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, size_t, size_t, size_t, size_t, int8_t, float, const float*, const int8_t*, const int32_t*, int8_t, float, int8_t, int8_t, uint32_t, xnn_caches_t, xnn_operator**)’
3357 | xnn_caches_t caches,
| ~^~
In file included from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h:8,
from /output/workspace/pytorch/aten/src/ATen/native/RNN.cpp:8:
/output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h: In function ‘xnn_status at::native::xnnp_utils::xnnp_create_fully_connected_nc(size_t, size_t, size_t, size_t, int8_t, float, int8_t, float, const int8_t*, const int32_t*, int8_t, float, int8_t, int8_t, uint32_t, xnn_operator**)’:
/output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h:257:7: error: cannot convert ‘xnn_operator**’ to ‘xnn_caches_t’ {aka ‘const xnn_caches*’}
257 | fully_connected_op_out); /* xnn_operator_t* fully_connected_op_out */
| ^
~
| |
| xnn_operator**
In file included from /output/workspace/pytorch/aten/src/ATen/native/xnnpack/Common.h:7,
from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h:7,
from /output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h:8,
from /output/workspace/pytorch/aten/src/ATen/native/RNN.cpp:8:
/output/workspace/pytorch/third_party/XNNPACK/include/xnnpack.h:3529:16: note: initializing argument 15 of ‘xnn_status xnn_create_fully_connected_nc_qs8(size_t, size_t, size_t, size_t, int8_t, float, float, const int8_t*, const int32_t*, int8_t, float, int8_t, int8_t, uint32_t, xnn_caches_t, xnn_operator**)’
3529 | xnn_caches_t caches,
| ~~~~~~~~~^~
[4985/6394] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/
/aten/src/ATen/native/TensorAdvancedIndexing.cpp.o

by a few attempts, still no any progress like above

my basic setup:
cuda 11.7.x
cudnn 8.x
pytorch 1.13.0

The error is raised in:

/output/workspace/pytorch/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h:102:9: error: cannot convert ‘xnn_operator**’ to ‘xnn_caches_t’ {aka ‘const xnn_caches*’}

so you might want to disable building NNPACK.

very much thank you, @ptrblck
no, actually build option with USE_NNPACK=ON as default, my real instructions is

DEBUG=1 USE_CUDA=1 USE_DISTRIBUTED=0 python setup.py develop

through a few failures, then make a rebuild with USE_NNPACK=0 also not works but stdout as follows

...
Generating sleef.h: mkrename cinz_ 8 16 __m512d __m512 __m256i __m512i __AVX512F__ avx512fnofma
Generating sleef.h: mkrename cinz_ 1 1 double float int32_t int32_t __STDC__ purec
Generating sleef.h: mkrename finz_ 1 1 double float int32_t int32_t FP_FAST_FMA purecfma
[5215/6612] Building C object caffe2/CMakeFiles/torch_cpu.dir/__/third_party/miniz-2.1.0/miniz.c.o
/output/workspace/pytorch/third_party/miniz-2.1.0/miniz.c:3157:9: note: #pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.
 3157 | #pragma message("Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.")
      |         ^~~~~~~
[5242/6612] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/onednn/interface.cpp.o
/output/workspace/pytorch/torch/csrc/jit/codegen/onednn/interface.cpp: In function ‘torch::jit::Operation torch::jit::createLlgaKernel(const torch::jit::Node*)’:
/output/workspace/pytorch/torch/csrc/jit/codegen/onednn/interface.cpp:97:3: warning: ‘torch::jit::Operation::Operation(F&&) [with F = torch::jit::createLlgaKernel(const torch::jit::Node*)::<lambda(torch::jit::Stack*)>; typename std::enable_if<std::is_constructible<std::function<void(std::vector<c10::IValue>*)>, F&&>::value, int>::type <anonymous> = 0]’ is deprecated: Please use void(Stack&) to register operator instead. [-Wdeprecated-declarations]
   97 |   };
      |   ^
In file included from /output/workspace/pytorch/aten/src/ATen/core/boxing/KernelFunction.h:5,
                 from /output/workspace/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:4,
                 from /output/workspace/pytorch/torch/csrc/jit/runtime/operator.h:6,
                 from /output/workspace/pytorch/torch/csrc/jit/ir/ir.h:7,
                 from /output/workspace/pytorch/torch/csrc/jit/codegen/onednn/defer_size_check.h:3,
                 from /output/workspace/pytorch/torch/csrc/jit/codegen/onednn/interface.cpp:2:
/output/workspace/pytorch/aten/src/ATen/core/stack.h:25:3: note: declared here
   25 |   Operation(F&& raw): op_([raw = std::forward<F>(raw)](Stack& stack) {
      |   ^~~~~~~~~
/output/workspace/pytorch/torch/csrc/jit/codegen/onednn/interface.cpp: In function ‘torch::jit::Operation torch::jit::createLlgaGuardKernel(const torch::jit::Node*)’:
/output/workspace/pytorch/torch/csrc/jit/codegen/onednn/interface.cpp:162:3: warning: ‘torch::jit::Operation::Operation(F&&) [with F = torch::jit::createLlgaGuardKernel(const torch::jit::Node*)::<lambda(torch::jit::Stack*)>; typename std::enable_if<std::is_constructible<std::function<void(std::vector<c10::IValue>*)>, F&&>::value, int>::type <anonymous> = 0]’ is deprecated: Please use void(Stack&) to register operator instead. [-Wdeprecated-declarations]
  162 |   };
      |   ^
In file included from /output/workspace/pytorch/aten/src/ATen/core/boxing/KernelFunction.h:5,
                 from /output/workspace/pytorch/aten/src/ATen/core/dispatch/Dispatcher.h:4,
                 from /output/workspace/pytorch/torch/csrc/jit/runtime/operator.h:6,
                 from /output/workspace/pytorch/torch/csrc/jit/ir/ir.h:7,
                 from /output/workspace/pytorch/torch/csrc/jit/codegen/onednn/defer_size_check.h:3,
                 from /output/workspace/pytorch/torch/csrc/jit/codegen/onednn/interface.cpp:2:
/output/workspace/pytorch/aten/src/ATen/core/stack.h:25:3: note: declared here
   25 |   Operation(F&& raw): op_([raw = std::forward<F>(raw)](Stack& stack) {
      |   ^~~~~~~~~
[5739/6612] Linking CXX shared library lib/libtorch_cpu.so
...
...here hangs for a long time and after  about two hour, exit with below exception...
...
Compiling  reduce_scatter.cu                   > /output/workspace/pytorch/build/nccl/obj/collectives/device/reduce_scatter_sumpostdiv_f64.o
Compiling  reduce_scatter.cu                   > /output/workspace/pytorch/build/nccl/obj/collectives/device/reduce_scatter_sumpostdiv_bf16.o
Compiling  functions.cu                        > /output/workspace/pytorch/build/nccl/obj/collectives/device/functions.o
Compiling  onerank_reduce.cu                   > /output/workspace/pytorch/build/nccl/obj/collectives/device/onerank_reduce.o
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
make[2]: *** [Makefile:73: /output/workspace/pytorch/build/nccl/obj/collectives/device/devlink.o] Error 1
make[2]: Leaving directory '/output/workspace/pytorch/third_party/nccl/nccl/src/collectives/device'
make[1]: *** [Makefile:51: /output/workspace/pytorch/build/nccl/obj/collectives/device/colldevice.a] Error 2
make[1]: Leaving directory '/output/workspace/pytorch/third_party/nccl/nccl/src'
make: *** [Makefile:25: src.build] Error 2
ninja: build stopped: subcommand failed.

and rebuild output below without clean ops

...
-- Generating done
-- Build files have been written to: /output/workspace/pytorch/build
[24/903] Building CXX object third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/Logger.cpp.o
/output/workspace/pytorch/third_party/kineto/libkineto/src/Logger.cpp:28:32: warning: unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]
   28 | #pragma GCC diagnostic ignored "-Wglobal-constructors"
      |                                ^~~~~~~~~~~~~~~~~~~~~~~
[30/903] Performing build step for 'nccl_external'
FAILED: nccl_external-prefix/src/nccl_external-stamp/nccl_external-build nccl/lib/libnccl_static.a /output/workspace/pytorch/build/nccl_external-prefix/src/nccl_external-stamp/nccl_external-build /output/workspace/pytorch/build/nccl/lib/libnccl_static.a
cd /output/workspace/pytorch/third_party/nccl/nccl && make -j8 -l8 CXX=/usr/bin/c++ CUDA_HOME=/usr/local/cuda NVCC=/usr/local/cuda/bin/nvcc "NVCC_GENCODE=-gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86" BUILDDIR=/output/workspace/pytorch/build/nccl VERBOSE=0 --jobs=8 && /opt/conda/bin/cmake -E touch /output/workspace/pytorch/build/nccl_external-prefix/src/nccl_external-stamp/nccl_external-build
make -C src build BUILDDIR=/output/workspace/pytorch/build/nccl
make[1]: Entering directory '/output/workspace/pytorch/third_party/nccl/nccl/src'
Compiling  bootstrap.cc                        > /output/workspace/pytorch/build/nccl/obj/bootstrap.o
Compiling  transport.cc                        > /output/workspace/pytorch/build/nccl/obj/transport.o
Compiling  enqueue.cc                          > /output/workspace/pytorch/build/nccl/obj/enqueue.o
Compiling  group.cc                            > /output/workspace/pytorch/build/nccl/obj/group.o
Compiling  debug.cc                            > /output/workspace/pytorch/build/nccl/obj/debug.o
Compiling  proxy.cc                            > /output/workspace/pytorch/build/nccl/obj/proxy.o
make[2]: Entering directory '/output/workspace/pytorch/third_party/nccl/nccl/src/collectives/device'
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
make[2]: *** [Makefile:73: /output/workspace/pytorch/build/nccl/obj/collectives/device/devlink.o] Error 1
make[2]: Leaving directory '/output/workspace/pytorch/third_party/nccl/nccl/src/collectives/device'
make[1]: *** [Makefile:51: /output/workspace/pytorch/build/nccl/obj/collectives/device/colldevice.a] Error 2
make[1]: Leaving directory '/output/workspace/pytorch/third_party/nccl/nccl/src'
make: *** [Makefile:25: src.build] Error 2
[31/903] Linking CXX shared library lib/libtorch_cpu.so
ninja: build stopped: subcommand failed.

c++: fatal error: Killed signal terminated program cc1plus
might indicate you might be running out of host RAM so try to reduce the number of processes by MAX_JOBS=1 (or another value which does not flood your RAM).

In kernel, it is actually see that an OOM about cc1plus process was raise, like

...
[Fri Jul 21 00:38:42 2023] Memory cgroup out of memory: Killed process 715198 (cc1plus) total-vm:36635740kB, anon-rss:33323980kB, file-rss:15516kB, shmem-rss:0kB, UID:0 pgtables:65444kB oom_score_adj:-997
...

but MAX_JOBS=1 seems not work, hangs at early steps and make no any process
so i intend to add more memory then make verifcation again

@ptrblck much appreciated for you that build with more memory successed

Great! It’s still weird that MAX_JOBS=1 did not work.