NanCheck.cu.o - ninja: build stopped: subcommand failed

Hello,

I’m trying to build pytorch from source for 2 x Tesla K80s CUDA11.4 but I get this error:

[7075/7977] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/NanCheck.cu.o
ninja: build stopped: subcommand failed.

I have:

NVIDIA-SMI 470.256.02 Driver Version: 470.256.02 CUDA Version: 11.4
Cuda compilation tools, release 11.4, V11.4.100
Build cuda_11.4.r11.4/compiler.30188945_0
gcc (GCC) 9.3.0
g++ (GCC) 9.3.0
numpy 1.19.5
Python 3.9.20
Ninja version 1.11.1.git.kitware.jobserver-1

Tried with every gcc from 9.3.0 to 11th version and I get the same error.

Tried to solve this with both Copilot and Chatgpt but nothing works anymore. If anyone can offer his support I really appreciate it.

Thank you.

The build log should show a previous error which is causing ninja to fail, so search for any failures before the posted output.

1 Like

Hello,

Thank you so much for your reply.
After a lot of hours trying to fix various errors I am back where I started with the same error.
It stops at the same error. Tried various cudnn versions, tried changing/adding things in the CMakeLists.txt file.

Nothing helped.

Can’t figure out what’s wrong with it. I would really really appreciate every help or direction I can receive. Some of the output on my screen is:

devtest01:~/pytorch$ python3 setup.py install
Building wheel torch-2.6.0a0+gitd2ec289

– Building version 2.6.0a0+gitd2ec289

cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/mike/pytorch/torch -DCMAKE_PREFIX_PATH=/usr/lib/python3.9/site-packages;/usr -DPython_EXECUTABLE=/usr/bin/python3 -DTORCH_BUILD_VERSION=2.6.0a0+gitd2ec289 -DTORCH_CUDA_ARCH_LIST=3.7+PTX -DUSE_NUMPY=True /home/mike/pytorch

– The CXX compiler identification is GNU 9.3.0

– The C compiler identification is GNU 9.3.0

– Detecting CXX compiler ABI info

– Detecting CXX compiler ABI info - done

– Check for working CXX compiler: /usr/bin/c++ - skipped

– Detecting CXX compile features

– Detecting CXX compile features - done

– Detecting C compiler ABI info

– Detecting C compiler ABI info - done

– Check for working C compiler: /usr/bin/cc - skipped

– Detecting C compile features

– Detecting C compile features - done

– /usr/bin/c++ /home/mike/pytorch/torch/abi-check.cpp -o /home/mike/pytorch/build/abi-check

– Determined _GLIBCXX_USE_CXX11_ABI=1

– No SVE processor on this machine.

– Compiler does not support SVE extension. Will not build perfkernels.

– Found CUDA: /usr/local/cuda-11.4 (found version “11.4”)

– The CUDA compiler identification is NVIDIA 11.4.100

– Detecting CUDA compiler ABI info

– Detecting CUDA compiler ABI info - done

– Check for working CUDA compiler: /usr/local/cuda-11.4/bin/nvcc - skipped

– Detecting CUDA compile features

– Detecting CUDA compile features - done

– Found CUDAToolkit: /usr/local/cuda-11.4/include (found version “11.4.100”)

– Performing Test CMAKE_HAVE_LIBC_PTHREAD

– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success

– Found Threads: TRUE

– Caffe2: CUDA detected: 11.4

– Caffe2: CUDA nvcc is: /usr/local/cuda-11.4/bin/nvcc

– Caffe2: CUDA toolkit directory: /usr/local/cuda-11.4

– Caffe2: Header version is: 11.4

– Found Python: /usr/bin/python3 (found version “3.9.20”) found components: Interpreter

CMake Warning at cmake/public/cuda.cmake:140 (message):

Failed to compute shorthash for libnvrtc.so

Call Stack (most recent call first):

cmake/Dependencies.cmake:44 (include)

CMakeLists.txt:853 (include)

– Found nvtx3: /home/mike/pytorch/third_party/NVTX/c/include

– Found CUDNN: /usr/local/cuda-11.4/lib64/libcudnn.so

– Found CUSPARSELT: /usr/lib/x86_64-linux-gnu/libcusparseLt.so

– Found CUDSS: /usr/lib/x86_64-linux-gnu/libcudss.so

– USE_CUFILE is set to 0. Compiling without cuFile support

– Added CUDA NVCC flags for: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_37,code=compute_37

CMake Warning at cmake/Dependencies.cmake:95 (message):

Not compiling with XPU. Could NOT find SYCL.Suppress this warning with

-DUSE_XPU=OFF.

Call Stack (most recent call first):

CMakeLists.txt:853 (include)

– Building using own protobuf under third_party per request.

– Use custom protobuf build.

CMake Deprecation Warning at third_party/protobuf/cmake/CMakeLists.txt:2 (cmake_minimum_required):

Compatibility with CMake < 3.5 will be removed from a future version of

CMake.

Update the VERSION argument value or use a … suffix to tell

CMake that the project does not need compatibility with older versions.

– 3.13.0.0

– Performing Test protobuf_HAVE_BUILTIN_ATOMICS

– Performing Test protobuf_HAVE_BUILTIN_ATOMICS - Success

– Caffe2 protobuf include directory: $<BUILD_INTERFACE:/home/mike/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>

– Trying to find preferred BLAS backend of choice: MKL

– MKL_THREADING = OMP
/home/mike/pytorch/aten/src/ATen/cuda/Exceptions.h:111:48: error: ‘cusolverStatus_t’ was not declared in this scope; did you mean ‘cusparseStatus_t’?
111 | C10_EXPORT const char* cusolverGetErrorMessage(cusolverStatus_t status);
| ^~~~~~~~~~~~~~~~
| cusparseStatus_t
[7052/7981] Building CXX object caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/native/quantized/cudnn/Linear.cpp.o
FAILED: caffe2/CMakeFiles/torch_cuda.dir/
/aten/src/ATen/native/quantized/cudnn/Linear.cpp.o
/usr/bin/ccache /usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DFLASHATTENTION_DISABLE_ALIBI -DFMT_HEADER_ONLY=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DTORCH_CUDA_USE_NVTX3 -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_CUDSS -DUSE_CUSPARSELT -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_MEM_EFF_ATTENTION -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -I/home/mike/pytorch/build/aten/src -I/home/mike/pytorch/aten/src -I/home/mike/pytorch/build -I/home/mike/pytorch -I/home/mike/pytorch/cmake/…/third_party/benchmark/include -I/home/mike/pytorch/third_party/onnx -I/home/mike/pytorch/build/third_party/onnx -I/home/mike/pytorch/nlohmann -I/home/mike/pytorch/aten/src/THC -I/home/mike/pytorch/aten/src/ATen/cuda -I/home/mike/pytorch/third_party/fmt/include -I/home/mike/pytorch/aten/src/ATen/…/…/…/third_party/cutlass/include -I/home/mike/pytorch/aten/src/ATen/…/…/…/third_party/cutlass/tools/util/include -I/home/mike/pytorch/build/caffe2/aten/src -I/home/mike/pytorch/aten/src/ATen/… -I/home/mike/pytorch/build/nccl/include -I/home/mike/pytorch/c10/cuda/…/… -I/home/mike/pytorch/c10/… -I/home/mike/pytorch/third_party/tensorpipe -I/home/mike/pytorch/build/third_party/tensorpipe -I/home/mike/pytorch/third_party/tensorpipe/third_party/libnop/include -I/home/mike/pytorch/torch/csrc/api -I/home/mike/pytorch/torch/csrc/api/include -isystem /home/mike/pytorch/build/third_party/gloo -isystem /home/mike/pytorch/cmake/…/third_party/gloo -isystem /home/mike/pytorch/cmake/…/third_party/tensorpipe/third_party/libuv/include -isystem /home/mike/pytorch/cmake/…/third_party/googletest/googlemock/include -isystem /home/mike/pytorch/cmake/…/third_party/googletest/googletest/include -isystem /home/mike/pytorch/third_party/protobuf/src -isystem /usr/include/mkl -isystem /home/mike/pytorch/third_party/XNNPACK/include -isystem /home/mike/pytorch/third_party/ittapi/include -isystem /home/mike/pytorch/cmake/…/third_party/eigen -isystem /usr/local/cuda-11.4/include -isystem /home/mike/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /home/mike/pytorch/third_party/ideep/include -isystem /home/mike/pytorch/INTERFACE -isystem /home/mike/pytorch/third_party/nlohmann/include -isystem /home/mike/pytorch/third_party/NVTX/c/include -isystem /home/mike/pytorch/cmake/…/third_party/cudnn_frontend/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wunused-function -Wunused-variable -Wunused-but-set-variable -Wno-maybe-uninitialized -fvisibility=hidden -O2 -MD -MT caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/native/quantized/cudnn/Linear.cpp.o -MF caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/native/quantized/cudnn/Linear.cpp.o.d -o caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/native/quantized/cudnn/Linear.cpp.o -c /home/mike/pytorch/aten/src/ATen/native/quantized/cudnn/Linear.cpp
In file included from /home/mike/pytorch/aten/src/ATen/native/quantized/cudnn/Linear.cpp:9:
/home/mike/pytorch/aten/src/ATen/cuda/Exceptions.h:111:48: error: ‘cusolverStatus_t’ was not declared in this scope; did you mean ‘cusparseStatus_t’?
111 | C10_EXPORT const char* cusolverGetErrorMessage(cusolverStatus_t status);
| ^~~~~~~~~~~~~~~~
| cusparseStatus_t
[7055/7981] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/
/aten/src/ATen/native/quantized/cudnn/Conv.cpp.o
FAILED: caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/native/quantized/cudnn/Conv.cpp.o
/usr/bin/ccache /usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DFLASHATTENTION_DISABLE_ALIBI -DFMT_HEADER_ONLY=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DTORCH_CUDA_USE_NVTX3 -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_CUDSS -DUSE_CUSPARSELT -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_MEM_EFF_ATTENTION -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -I/home/mike/pytorch/build/aten/src -I/home/mike/pytorch/aten/src -I/home/mike/pytorch/build -I/home/mike/pytorch -I/home/mike/pytorch/cmake/…/third_party/benchmark/include -I/home/mike/pytorch/third_party/onnx -I/home/mike/pytorch/build/third_party/onnx -I/home/mike/pytorch/nlohmann -I/home/mike/pytorch/aten/src/THC -I/home/mike/pytorch/aten/src/ATen/cuda -I/home/mike/pytorch/third_party/fmt/include -I/home/mike/pytorch/aten/src/ATen/…/…/…/third_party/cutlass/include -I/home/mike/pytorch/aten/src/ATen/…/…/…/third_party/cutlass/tools/util/include -I/home/mike/pytorch/build/caffe2/aten/src -I/home/mike/pytorch/aten/src/ATen/… -I/home/mike/pytorch/build/nccl/include -I/home/mike/pytorch/c10/cuda/…/… -I/home/mike/pytorch/c10/… -I/home/mike/pytorch/third_party/tensorpipe -I/home/mike/pytorch/build/third_party/tensorpipe -I/home/mike/pytorch/third_party/tensorpipe/third_party/libnop/include -I/home/mike/pytorch/torch/csrc/api -I/home/mike/pytorch/torch/csrc/api/include -isystem /home/mike/pytorch/build/third_party/gloo -isystem /home/mike/pytorch/cmake/…/third_party/gloo -isystem /home/mike/pytorch/cmake/…/third_party/tensorpipe/third_party/libuv/include -isystem /home/mike/pytorch/cmake/…/third_party/googletest/googlemock/include -isystem /home/mike/pytorch/cmake/…/third_party/googletest/googletest/include -isystem /home/mike/pytorch/third_party/protobuf/src -isystem /usr/include/mkl -isystem /home/mike/pytorch/third_party/XNNPACK/include -isystem /home/mike/pytorch/third_party/ittapi/include -isystem /home/mike/pytorch/cmake/…/third_party/eigen -isystem /usr/local/cuda-11.4/include -isystem /home/mike/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /home/mike/pytorch/third_party/ideep/include -isystem /home/mike/pytorch/INTERFACE -isystem /home/mike/pytorch/third_party/nlohmann/include -isystem /home/mike/pytorch/third_party/NVTX/c/include -isystem /home/mike/pytorch/cmake/…/third_party/cudnn_frontend/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wunused-function -Wunused-variable -Wunused-but-set-variable -Wno-maybe-uninitialized -fvisibility=hidden -O2 -MD -MT caffe2/CMakeFiles/torch_cuda.dir/
/aten/src/ATen/native/quantized/cudnn/Conv.cpp.o -MF caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/native/quantized/cudnn/Conv.cpp.o.d -o caffe2/CMakeFiles/torch_cuda.dir//aten/src/ATen/native/quantized/cudnn/Conv.cpp.o -c /home/mike/pytorch/aten/src/ATen/native/quantized/cudnn/Conv.cpp
In file included from /home/mike/pytorch/aten/src/ATen/native/quantized/cudnn/Conv.cpp:9:
/home/mike/pytorch/aten/src/ATen/cuda/Exceptions.h:111:48: error: ‘cusolverStatus_t’ was not declared in this scope; did you mean ‘cusparseStatus_t’?
111 | C10_EXPORT const char* cusolverGetErrorMessage(cusolverStatus_t status);
| ^~~~~~~~~~~~~~~~
| cusparseStatus_t
[7057/7981] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/CUDASymmetricMemory.cu.o
/home/mike/pytorch/torch/csrc/distributed/c10d/CUDASymmetricMemory.cu(25): warning: parameter “device_idx” was declared but never referenced

/home/mike/pytorch/torch/csrc/distributed/c10d/CUDASymmetricMemory.cu(154): warning: function “::IpcChannel::broadcast_fds” was declared but never referenced

[7063/7981] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/NanCheck.cu.o
ninja: build stopped: subcommand failed.

The actual failure is:

/home/mike/pytorch/aten/src/ATen/cuda/Exceptions.h:111:48: error: ‘cusolverStatus_t’ was not declared in this scope; did you mean ‘cusparseStatus_t’?
111 | C10_EXPORT const char* cusolverGetErrorMessage(cusolverStatus_t status);
| ^~~~~~~~~~~~~~~~
| cusparseStatus_t

which points to a missing cuSOLVER library. Check if cuSOLVER is installed and if not, reinstall your CUDA toolkit with it.