Error when building pytorch1.7.0 from source

In order to use NCCL2.11.4, I compile and install pytorch1.7.0 through the source code.
My system environment is as follows:

WSL2 ubuntu20.04
CUDA 11.0
GPU:TITAN V.
Driver Version: 470.05
cudnn 8.0.5
python 3.8.5

The commands used for installation are as follows:

git clone --recursive GitHub - pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration
cd pytorch
git checkout v1.7.0
git submodule sync
git submodule update --init --recursive
export USE_SYSTEM_NCCL=1
export USE_MKLDNN=0
export MAX_JOBS=16
python3 setup.py install

The configuration during compilation is as follows:

– General:
– CMake version : 3.16.3
– CMake command : /usr/bin/cmake
– System : Linux
– C++ compiler : /usr/bin/c++
– C++ compiler id : GNU
– C++ compiler version : 9.3.0
– BLAS : MKL
– CXX flags : -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow
– Build type : Release
– Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
– CMAKE_PREFIX_PATH : /usr/lib/python3/dist-packages;/usr/local/cuda-11.0;/usr/local/cuda-11.0
– CMAKE_INSTALL_PREFIX : /mnt/d/pytorch/torch

– TORCH_VERSION : 1.7.0
– CAFFE2_VERSION : 1.7.0
– BUILD_CAFFE2 : ON
– BUILD_CAFFE2_OPS : ON
– BUILD_CAFFE2_MOBILE : OFF
– BUILD_STATIC_RUNTIME_BENCHMARK: OFF
– BUILD_BINARY : OFF
– BUILD_CUSTOM_PROTOBUF : ON
– Link local protobuf : ON
– BUILD_DOCS : OFF
– BUILD_PYTHON : True
– Python version : 3.8.5
– Python executable : /usr/bin/python3
– Pythonlibs version : 3.8.5
– Python library : /usr/lib/libpython3.8.so.1.0
– Python includes : /usr/include/python3.8
– Python site-packages: lib/python3/dist-packages
– BUILD_SHARED_LIBS : ON
– BUILD_TEST : True
– BUILD_JNI : OFF
– BUILD_MOBILE_AUTOGRAD : OFF
– INTERN_BUILD_MOBILE :
– USE_ASAN : OFF
– USE_CPP_CODE_COVERAGE : OFF
– USE_CUDA : ON
– CUDA static link : OFF
– USE_CUDNN : ON
– CUDA version : 11.0
– cuDNN version : 8.0.5
– CUDA root directory : /usr/local/cuda-11.0
– CUDA library : /usr/local/cuda-11.0/lib64/stubs/libcuda.so
– cudart library : /usr/local/cuda-11.0/lib64/libcudart.so
– cublas library : /usr/local/cuda-11.0/lib64/libcublas.so
– cufft library : /usr/local/cuda-11.0/lib64/libcufft.so
– curand library : /usr/local/cuda-11.0/lib64/libcurand.so
– cuDNN library : /usr/local/cuda-11.0/lib64/libcudnn.so
– nvrtc : /usr/local/cuda-11.0/lib64/libnvrtc.so
– CUDA include path : /usr/local/cuda-11.0/include
– NVCC executable : /usr/local/cuda-11.0/bin/nvcc
– NVCC flags : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_70,code=sm_70;-Xcudafe;–diag_suppress=cc_clobber_ignored;-Xcudafe;–diag_suppress=integer_sign_change;-Xcudafe;–diag_suppress=useless_using_declaration;-Xcudafe;–diag_suppress=set_but_not_used;-Xcudafe;–diag_suppress=field_without_dll_interface;-Xcudafe;–diag_suppress=base_class_has_different_dll_interface;-Xcudafe;–diag_suppress=dll_interface_conflict_none_assumed;-Xcudafe;–diag_suppress=dll_interface_conflict_dllexport_assumed;-Xcudafe;–diag_suppress=implicit_return_from_non_void_function;-Xcudafe;–diag_suppress=unsigned_compare_with_zero;-Xcudafe;–diag_suppress=declared_but_not_referenced;-Xcudafe;–diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;–expt-relaxed-constexpr;–expt-extended-lambda;-Wno-deprecated-gpu-targets;–expt-extended-lambda;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
– CUDA host compiler : /usr/bin/cc
– NVCC --device-c : OFF
– USE_TENSORRT : OFF
– USE_ROCM : OFF
– USE_EIGEN_FOR_BLAS : ON
– USE_FBGEMM : ON
– USE_FAKELOWP : OFF
– USE_FFMPEG : OFF
– USE_GFLAGS : OFF
– USE_GLOG : OFF
– USE_LEVELDB : OFF
– USE_LITE_PROTO : OFF
– USE_LMDB : OFF
– USE_METAL : OFF
– USE_MKL : OFF
– USE_MKLDNN : 0
– USE_NCCL : ON
– USE_SYSTEM_NCCL : 1
– USE_NNPACK : ON
– USE_NUMPY : ON
– USE_OBSERVERS : ON
– USE_OPENCL : OFF
– USE_OPENCV : OFF
– USE_OPENMP : ON
– USE_TBB : OFF
– USE_VULKAN : OFF
– USE_PROF : OFF
– USE_QNNPACK : ON
– USE_PYTORCH_QNNPACK : ON
– USE_REDIS : OFF
– USE_ROCKSDB : OFF
– USE_ZMQ : OFF
– USE_DISTRIBUTED : ON
– USE_MPI : ON
– USE_GLOO : ON
– USE_TENSORPIPE : ON
– Public Dependencies : Threads::Threads
– Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fbgemm;/usr/lib/x86_64-linux-gnu/libnuma.so;fp16;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;gcc_s;gcc;dl
– Configuring done

However, many attempts to compile stopped at 50%. The screenshot when stopping is as follows:

There is no error prompt, just stop here and there is no response. The computer does not crash, and the compilation process can also be artificially interrupted through “Ctrl + C”. However, without manual interruption, the compilation process will always stop here.

How can I solve this problem?

Could you check if you might be running out of host memory and are seeing a massive slowdown due to the potential usage of swap space?
If so, limit the number of workers via MAX_JOBS=n python setup.py install.

Thank you for your help. My computer memory is 128G, and it takes up about 40G stably during compilation. My CPU is Xeon e5-2650 V4, with 32 cores in total. My compilation setting is MAX_JOBS= 16, CPU usage is about 40%. The CPU consumption after compilation stops is about 9%.

OK, that doesn’t seem to be the root cause then.
Was building PyTorch from source working before at one point? If so, did you change anything in the build stack (GCC etc.) or is this your first build?
Could you also python setup.py clean the build and try to rebuild afterwards?

I compiled three times. After the first compilation stopped, I waited for about two hours, and then “Ctrl + C” ended the compilation process. I suspected it was MAX_ JOBS’s problem, so I compiled the second time after “Python3 setup.py clean ”. This time I used MAX_ JOBS = 16, still stop at the same position. So I compiled the third time after “python3 setup.py clean” . This time I suspected that it was a problem with MKLDNN, so I used USE_ MKLDNN = 0, still invalid. This is the whole process. I haven’t changed anything else.

OK, thanks for the description. Given this it seems that you were never successful in building from source in this setup. As a further debugging step: would it be possible for you to use a clean docker container and try to build PyTorch from source there? If this would be working then the issue would potentially point to a setup issue which causes the slowdown/hang.

OK, I’ll try. In addition, I would like to ask, when compiling, I did not uninstall torch 1.7.1 + cu110 which pip3 installed from the pytorch website. Will this affect the compilation?

No, it shouldn’t affect the compilation, but could potentially “mask” your source build once it’s successfully built. I.e. the import torch could load the binaries instead of the source build as both packages are installed in the same environment.

I tried again and solved the problem. When the compilation stops at 50%, just wait. After about 2.5h, the compilation will continue. Finally, the compilation is completed within 12h. Thank you for your patient help and guidance. @ptrblck

In addition, a suggestion: can the next version of PyTorch (perhaps version 1.11 or 1.10.1?) integrate NCCL2.11.4 so that all WSL2 users can directly pip3 install and use it without spending a lot of time compiling the source code.

This is still highly unusual as I’ve never seen such a behavior on a proper system (I’ve seen it on a Raspberry Pi, which needed to use its swap heavily, so this was expected).

Could you create a feature request on GitHub for the NCCL update so that we can revisit and check in which release it could land, please?

Of course, I have submitted a feature request in GitHub. The link is as follows: Upgrade NCCL2.11.4 in the new PyTorch · Issue #68165 · pytorch/pytorch · GitHub