Compilation hangs: too many threads

Hi,

Compiling PyTorch from source failed;
My machine hangs because a huge number of threads is started for compiling PyTorch.
Is there a way to limit the number of threads when compiling PyTorch?
Thanks for any help,

PyTorch Version (e.g., 1.0): master
Box: OpenSuse 15.2
How you installed PyTorch (conda, pip, source): source
Build command you used (if compiling from source): python3 setup.py install
following instructions here
that is

git clone --recursive https://github.com/pytorch/pytorch

cd pytorch

git submodule sync

git submodule update --init --recursive

python3 setup.py install

Further info:
Python version: 3.6.12
CUDA/cuDNN version: 10.1

[…]


– ******** Summary ********
– General:
– CMake version : 3.19.1
– CMake command : /usr/bin/cmake
– System : Linux
– C++ compiler : /usr/bin/c++
– C++ compiler id : GNU
– C++ compiler version : 7.5.0
– CXX flags : -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow
– Build type : Release
– Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
– CMAKE_PREFIX_PATH : /usr/lib/python3.6/site-packages;/usr/local/cuda-10.1
– CMAKE_INSTALL_PREFIX : /home/laurent/pytorch/torch

– TORCH_VERSION : 1.8.0
– CAFFE2_VERSION : 1.8.0
– BUILD_CAFFE2 : ON
– BUILD_CAFFE2_OPS : ON
– BUILD_CAFFE2_MOBILE : OFF
– BUILD_STATIC_RUNTIME_BENCHMARK: OFF
– BUILD_TENSOREXPR_BENCHMARK: OFF
– BUILD_BINARY : OFF
– BUILD_CUSTOM_PROTOBUF : ON
– Link local protobuf : ON
– BUILD_DOCS : OFF
– BUILD_PYTHON : True
– Python version : 3.6.12
– Python executable : /usr/bin/python3
– Pythonlibs version : 3.6.12
– Python library : /usr/lib64/libpython3.6m.so.1.0
– Python includes : /usr/include/python3.6m
– Python site-packages: lib/python3.6/site-packages
– BUILD_SHARED_LIBS : ON
– CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF
– BUILD_TEST : True
– BUILD_JNI : OFF
– BUILD_MOBILE_AUTOGRAD : OFF
– INTERN_BUILD_MOBILE :
– USE_BLAS : 1
– BLAS : open
– USE_LAPACK : 1
– LAPACK : open
– USE_ASAN : OFF
– USE_CPP_CODE_COVERAGE : OFF
– USE_CUDA : ON
– CUDA static link : OFF
– USE_CUDNN : OFF
– CUDA version : 10.1
– CUDA root directory : /usr/local/cuda-10.1
– CUDA library : /usr/local/cuda-10.1/lib64/stubs/libcuda.so
– cudart library : /usr/local/cuda-10.1/lib64/libcudart.so
– cublas library : /usr/lib64/libcublas.so
– cufft library : /usr/local/cuda-10.1/lib64/libcufft.so
– curand library : /usr/local/cuda-10.1/lib64/libcurand.so
– nvrtc : /usr/local/cuda-10.1/lib64/libnvrtc.so
– CUDA include path : /usr/local/cuda-10.1/include
– NVCC executable : /usr/local/cuda-10.1/bin/nvcc
– NVCC flags : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_75,code=compute_75;-Xcudafe;–diag_suppress=cc_clobber_ignored,–diag_suppress=integer_sign_change,–diag_suppress=useless_using_declaration,–diag_suppress=set_but_not_used,–diag_suppress=field_without_dll_interface,–diag_suppress=base_class_has_different_dll_interface,–diag_suppress=dll_interface_conflict_none_assumed,–diag_suppress=dll_interface_conflict_dllexport_assumed,–diag_suppress=implicit_return_from_non_void_function,–diag_suppress=unsigned_compare_with_zero,–diag_suppress=declared_but_not_referenced,–diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;–expt-relaxed-constexpr;–expt-extended-lambda;-Wno-deprecated-gpu-targets;–expt-extended-lambda;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_BFLOAT16_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
– CUDA host compiler : /usr/bin/cc
– NVCC --device-c : OFF
– USE_TENSORRT : OFF
– USE_ROCM : OFF
– USE_EIGEN_FOR_BLAS : ON
– USE_FBGEMM : ON
– USE_FAKELOWP : OFF
– USE_KINETO : OFF
– USE_FFMPEG : OFF
– USE_GFLAGS : OFF
– USE_GLOG : OFF
– USE_LEVELDB : OFF
– USE_LITE_PROTO : OFF
– USE_LMDB : OFF
– USE_METAL : OFF
– USE_PYTORCH_METAL : OFF
– USE_FFTW : OFF
– USE_MKL : OFF
– USE_MKLDNN : ON
– USE_MKLDNN_CBLAS : OFF
– USE_NCCL : ON
– USE_SYSTEM_NCCL : OFF
– USE_NNPACK : ON
– USE_NUMPY : ON
– USE_OBSERVERS : ON
– USE_OPENCL : OFF
– USE_OPENCV : OFF
– USE_OPENMP : ON
– USE_TBB : OFF
– USE_VULKAN : OFF
– USE_PROF : OFF
– USE_QNNPACK : ON
– USE_PYTORCH_QNNPACK : ON
– USE_REDIS : OFF
– USE_ROCKSDB : OFF
– USE_ZMQ : OFF
– USE_DISTRIBUTED : ON
– USE_MPI : OFF
– USE_GLOO : ON
– USE_TENSORPIPE : ON
– Public Dependencies : Threads::Threads;caffe2::mkldnn
– Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fbgemm;/usr/lib64/libnuma.so;fp16;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;gcc_s;gcc;dl
– Configuring done
– Generating done
CMake Warning:
Manually-specified variables were not used by the project:

JAVA_HOME

– Build files have been written to: /home/laurent/pytorch/build
Scanning dependencies of target nccl_external
[ 0%] Creating directories for ‘nccl_external’
Scanning dependencies of target benchmark
[ 0%] Building CXX object third_party/benchmark/src/CMakeFiles/benchmark.dir/benchmark.cc.o
[ 0%] No download step for ‘nccl_external’
[ 0%] No update step for ‘nccl_external’
[ 0%] No patch step for ‘nccl_external’
[ 0%] No configure step for ‘nccl_external’
[ 0%] Performing build step for ‘nccl_external’
make[3]: warning: -jN forced in submake: disabling jobserver mode.
Scanning dependencies of target clog
[ 0%] Building C object confu-deps/cpuinfo/deps/clog/CMakeFiles/clog.dir/src/clog.c.o
Grabbing include/nccl_net.h > /home/laurent/pytorch/build/nccl/include/nccl_net.h
Generating nccl.h.in > /home/laurent/pytorch/build/nccl/include/nccl.h
[ 0%] Linking C static library …/…/…/…/lib/libclog.a
Generating nccl.pc.in > /home/laurent/pytorch/build/nccl/lib/pkgconfig/nccl.pc
Compiling init.cc > /home/laurent/pytorch/build/nccl/obj/init.o
Compiling bootstrap.cc > /home/laurent/pytorch/build/nccl/obj/bootstrap.o
Compiling channel.cc > /home/laurent/pytorch/build/nccl/obj/channel.o
Compiling transport.cc > /home/laurent/pytorch/build/nccl/obj/transport.o
Compiling enqueue.cc > /home/laurent/pytorch/build/nccl/obj/enqueue.o
Compiling group.cc > /home/laurent/pytorch/build/nccl/obj/group.o
Compiling debug.cc > /home/laurent/pytorch/build/nccl/obj/debug.o
Compiling proxy.cc > /home/laurent/pytorch/build/nccl/obj/proxy.o
Compiling misc/nvmlwrap.cc > /home/laurent/pytorch/build/nccl/obj/misc/nvmlwrap.o
Compiling misc/ibvwrap.cc > /home/laurent/pytorch/build/nccl/obj/misc/ibvwrap.o
Compiling misc/utils.cc > /home/laurent/pytorch/build/nccl/obj/misc/utils.o
Compiling misc/argcheck.cc > /home/laurent/pytorch/build/nccl/obj/misc/argcheck.o
Compiling transport/p2p.cc > /home/laurent/pytorch/build/nccl/obj/transport/p2p.o
Compiling transport/shm.cc > /home/laurent/pytorch/build/nccl/obj/transport/shm.o
Compiling transport/net.cc > /home/laurent/pytorch/build/nccl/obj/transport/net.o
Compiling transport/net_socket.cc > /home/laurent/pytorch/build/nccl/obj/transport/net_socket.o
Compiling transport/net_ib.cc > /home/laurent/pytorch/build/nccl/obj/transport/net_ib.o
Compiling transport/coll_net.cc > /home/laurent/pytorch/build/nccl/obj/transport/coll_net.o
Compiling collectives/sendrecv.cc > /home/laurent/pytorch/build/nccl/obj/collectives/sendrecv.o
Compiling collectives/all_reduce.cc > /home/laurent/pytorch/build/nccl/obj/collectives/all_reduce.o
Compiling collectives/all_gather.cc > /home/laurent/pytorch/build/nccl/obj/collectives/all_gather.o
Compiling collectives/broadcast.cc > /home/laurent/pytorch/build/nccl/obj/collectives/broadcast.o
Compiling collectives/reduce.cc > /home/laurent/pytorch/build/nccl/obj/collectives/reduce.o
Compiling collectives/reduce_scatter.cc > /home/laurent/pytorch/build/nccl/obj/collectives/reduce_scatter.o
Compiling graph/topo.cc > /home/laurent/pytorch/build/nccl/obj/graph/topo.o
Compiling graph/paths.cc > /home/laurent/pytorch/build/nccl/obj/graph/paths.o
Compiling graph/search.cc > /home/laurent/pytorch/build/nccl/obj/graph/search.o
Compiling graph/connect.cc > /home/laurent/pytorch/build/nccl/obj/graph/connect.o
Compiling graph/rings.cc > /home/laurent/pytorch/build/nccl/obj/graph/rings.o
Compiling graph/trees.cc > /home/laurent/pytorch/build/nccl/obj/graph/trees.o
Compiling graph/tuning.cc > /home/laurent/pytorch/build/nccl/obj/graph/tuning.o
Compiling graph/xml.cc > /home/laurent/pytorch/build/nccl/obj/graph/xml.o
Generating rules > /home/laurent/pytorch/build/nccl/obj/collectives/device/Makefile.rules
[ 0%] Built target clog
Scanning dependencies of target defs.bzl
[ 0%] Built target defs.bzl
[ 0%] Building CXX object third_party/benchmark/src/CMakeFiles/benchmark.dir/benchmark_register.cc.o
Scanning dependencies of target pthreadpool
[ 0%] Building C object confu-deps/pthreadpool/CMakeFiles/pthreadpool.dir/src/legacy-api.c.o
In file included from /home/laurent/pytorch/third_party/pthreadpool/src/legacy-api.c:8:0:
/home/laurent/pytorch/third_party/pthreadpool/include/pthreadpool.h:997:2: warning: ‘pthreadpool_function_1d_t’ is deprecated [-Wdeprecated-declarations]
pthreadpool_function_1d_t function,
^~~~~~~~~~~~~~~~~~~~~~~~~
/home/laurent/pytorch/third_party/pthreadpool/include/pthreadpool.h:1003:2: warning: ‘pthreadpool_function_1d_tiled_t’ is deprecated [-Wdeprecated-declarations]
pthreadpool_function_1d_tiled_t function,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[ many of these ]
Scanning dependencies of target gtest
[ 0%] Building CXX object third_party/googletest/googlemock/gtest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
Scanning dependencies of target asmjit
[ 0%] Building CXX object third_party/fbgemm/asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/arch.cpp.o
[ 0%] Building CXX object third_party/fbgemm/asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/assembler.cpp.o
[ 0%] Building C object confu-deps/pthreadpool/CMakeFiles/pthreadpool.dir/src/memory.c.o
In file included from /home/laurent/pytorch/third_party/pthreadpool/src/threadpool-object.h:30:0,
from /home/laurent/pytorch/third_party/pthreadpool/src/memory.c:19:
/home/laurent/pytorch/third_party/pthreadpool/include/pthreadpool.h:997:2: warning: ‘pthreadpool_function_1d_t’ is deprecated [-Wdeprecated-declarations]
pthreadpool_function_1d_t function,
^~~~~~~~~~~~~~~~~~~~~~~~~
/home/laurent/pytorch/third_party/pthreadpool/include/pthreadpool.h:1003:2: warning: ‘pthreadpool_function_1d_tiled_t’ is deprecated [-Wdeprecated-declarations]
pthreadpool_function_1d_tiled_t function,

[ many of these ]
[ 0%] Building C object confu-deps/pthreadpool/CMakeFiles/pthreadpool.dir/src/pthreads.c.o
[ 0%] Building CXX object third_party/fbgemm/asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/builder.cpp.o
In file included from /home/laurent/pytorch/third_party/pthreadpool/src/pthreads.c:50:0:
/home/laurent/pytorch/third_party/pthreadpool/include/pthreadpool.h:997:2: warning: ‘pthreadpool_function_1d_t’ is deprecated [-Wdeprecated-declarations]
pthreadpool_function_1d_t function,

[ many of these ]

[ 0%] Building C object confu-deps/pthreadpool/CMakeFiles/pthreadpool.dir/src/fastpath.c.o
In file included from /home/laurent/pytorch/third_party/pthreadpool/src/fastpath.c:16:0:
/home/laurent/pytorch/third_party/pthreadpool/include/pthreadpool.h:997:2: warning: ‘pthreadpool_function_1d_t’ is deprecated [-Wdeprecated-declarations]
pthreadpool_function_1d_t function,
^~~~~~~~~~~~~~~~~~~~~~~~~
/home/laurent/pytorch/third_party/pthreadpool/include/pthreadpool.h:1003:2: warning: ‘pthreadpool_function_1d_tiled_t’ is deprecated [-Wdeprecated-declarations]
pthreadpool_function_1d_tiled_t function,

Scanning dependencies of target libprotobuf-lite
Scanning dependencies of target libprotobuf
[ 0%] Building CXX object third_party/fbgemm/asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/callconv.cpp.o
[ 0%] Building CXX object third_party/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/any_lite.cc.o
[ 0%] Building CXX object third_party/protobuf/cmake/CMakeFiles/libprotobuf.dir/
/src/google/protobuf/any_lite.cc.o
[ 0%] Building CXX object third_party/fbgemm/asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/codeholder.cpp.o
[ 0%] Linking C static library …/…/lib/libpthreadpool.a
[ 0%] Building CXX object third_party/fbgemm/asmjit/CMakeFiles/asmjit.dir/src/asmjit/core/compiler.cpp.o
Compiling functions.cu > /home/laurent/pytorch/build/nccl/obj/collectives/device/functions.o
Compiling all_gather.cu > /home/laurent/pytorch/build/nccl/obj/collectives/device/all_gather_sum_i8.o
Compiling all_gather.cu > /home/laurent/pytorch/build/nccl/obj/collectives/device/all_gather_sum_u8.o
Compiling all_gather.cu > /home/laurent/pytorch/build/nccl/obj/collectives/device/all_gather_sum_i32.o
Compiling all_gather.cu > /home/laurent/pytorch/build/nccl/obj/collectives/device/all_gather_sum_u32.o
Compiling all_gather.cu > /home/laurent/pytorch/build/nccl/obj/collectives/device/all_gather_sum_i64.o
[ 0%] Built target pthreadpool

[ I killed the compilation before it freezes my machine ]

set MAX_JOBS environment variable (ninja default is 6)

with
export MAX_JOBS=4
it didn’t work as expected. Here a screenshot of the too many threads from CMake (v3.19.1):

There is no concurrent compilation running.

perhaps it is this

Exactly ! Big thanks !
It is curious that the fix is not merged. Or is it and the fix is not working? I will comment on the thread you mentioned.