Build failed nvcc fatal : Unsupported gpu architecture 'compute_89'

Hi there,

I’ve been attempting to build pytorch from source to no avail, it came out with nvcc fatal, please see below some parts of the log:

...
-- Found CUDA: /usr/local/cuda (found version "12.1") 
-- The CUDA compiler identification is NVIDIA 11.5.119
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/include (found version "11.5.119") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Caffe2: CUDA detected: 12.1
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 12.1
CMake Warning at cmake/public/cuda.cmake:143 (message):
  Failed to compute shorthash for libnvrtc.so
Call Stack (most recent call first):
  cmake/Dependencies.cmake:43 (include)
  CMakeLists.txt:853 (include)


-- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so  
-- Could NOT find CUSPARSELT (missing: CUSPARSELT_LIBRARY_PATH CUSPARSELT_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:234 (message):
  Cannot find cuSPARSELt library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:43 (include)
  CMakeLists.txt:853 (include)

-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 5.0;8.0;8.6;8.9;9.0;9.0a
-- Added CUDA NVCC flags for: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90a,code=sm_90a
CMake Warning at cmake/Dependencies.cmake:90 (message):
  Not compiling with XPU.  Could NOT find SYCL.Suppress this warning with
  -DUSE_XPU=OFF.
Call Stack (most recent call first):
  CMakeLists.txt:853 (include)
.....
-- Found CUB: /usr/local/cuda/include  
-- Converting CMAKE_CUDA_FLAGS to CUDA_NVCC_FLAGS:
    CUDA_NVCC_FLAGS                = -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS;-D_GLIBCXX_USE_CXX11_ABI=1;-Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90a,code=sm_90a;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl;--expt-relaxed-constexpr;--expt-extended-lambda
    CUDA_NVCC_FLAGS_DEBUG          = -g
    CUDA_NVCC_FLAGS_RELEASE        = -O3;-DNDEBUG
    CUDA_NVCC_FLAGS_RELWITHDEBINFO = -O2;-g;-DNDEBUG
    CUDA_NVCC_FLAGS_MINSIZEREL     = -O1;-DNDEBUG




  CMake variable CUDAToolkit_ROOT is set to:
    /usr/local/cuda
-- Found CUDAToolkit: /usr/include (found suitable version "11.5.119", minimum required is "7.0") 
-- CUDA detected: 11.5.119

--   Protobuf includes     : 
--   Protobuf libraries    : 
--   BUILD_ONNX_PYTHON     : OFF
-- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor
-- Adding -DNDEBUG to compile flags
-- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2
-- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - False
-- Compiling with MAGMA support
-- MAGMA INCLUDE DIRECTORIES: /home/handrianomena/anaconda3/envs/dl-env/include
-- MAGMA LIBRARIES: /home/handrianomena/anaconda3/envs/dl-env/lib/libmagma.a
-- MAGMA V2 check: 0
...
-- _GLIBCXX_USE_CXX11_ABI=1 is already defined as a cmake variable
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 5.0;8.0;8.6;8.9;9.0;9.0a
...
--   TORCH_VERSION         : 2.5.0
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_PYTHON          : True
--     Python version      : 3.10.14
--   BUILD_SHARED_LIBS     : ON
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : OFF
--   BUILD_TEST            : True
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   BUILD_LITE_INTERPRETER: OFF
--   INTERN_BUILD_MOBILE   : 
--   TRACING_BASED         : OFF
--   USE_BLAS              : 1
--     BLAS                : mkl
--     BLAS_HAS_SBGEMM     : 
--   USE_LAPACK            : 1
--     LAPACK              : mkl
--   USE_ASAN              : OFF
--   USE_TSAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : ON
--     Split CUDA          : 
--     CUDA static link    : OFF
--     USE_CUDNN           : ON
--     USE_CUSPARSELT      : OFF
--     CUDA version        : 12.1
--     USE_FLASH_ATTENTION : ON
--     USE_MEM_EFF_ATTENTION : ON
--     cuDNN version       : 9.2.1
--     CUDA root directory : /usr/local/cuda
--     CUDA library        : /usr/lib/x86_64-linux-gnu/libcuda.so
--     cudart library      : /usr/local/cuda/lib64/libcudart.so
--     cublas library      : /usr/local/cuda/lib64/libcublas.so
--     cufft library       : /usr/local/cuda/lib64/libcufft.so
--     curand library      : /usr/local/cuda/lib64/libcurand.so
--     cusparse library    : /usr/local/cuda/lib64/libcusparse.so
--     cuDNN library       : /usr/lib/x86_64-linux-gnu/libcudnn.so
--     nvrtc               : /usr/local/cuda/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda/include
--     NVCC executable     : /usr/local/cuda/bin/nvcc
--     CUDA compiler       : /usr/bin/nvcc
--     CUDA flags          :  -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_50,code=sm_50 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda  -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__
--     CUDA host compiler  : 
--     CUDA --device-c     : OFF
--     USE_TENSORRT        : 
--   USE_XPU               : OFF
--   USE_ROCM              : OFF
--   BUILD_NVFUSER         : 
--   USE_EIGEN_FOR_BLAS    : 
--   USE_FBGEMM            : ON
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : ON
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LITE_PROTO        : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_PYTORCH_METAL_EXPORT     : OFF
--   USE_MPS               : OFF
--   USE_MKL               : ON
--   USE_MKLDNN            : ON
--   USE_MKLDNN_ACL        : OFF
--   USE_MKLDNN_CBLAS      : OFF
--   USE_UCC               : OFF
--   USE_ITT               : ON
--   USE_NCCL              : ON
--     USE_SYSTEM_NCCL     : OFF
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENMP            : ON
--   USE_MIMALLOC          : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_PYTORCH_QNNPACK   : ON
--   USE_XNNPACK           : ON
--   USE_DISTRIBUTED       : ON
--     USE_MPI               : ON
--     USE_GLOO              : ON
--     USE_GLOO_WITH_OPENSSL : OFF
--     USE_TENSORPIPE        : ON
--   Public Dependencies  : caffe2::mkl
--   Private Dependencies : Threads::Threads;pthreadpool;cpuinfo;pytorch_qnnpack;nnpack;XNNPACK;fbgemm;ittnotify;fp16;caffe2::openmp;tensorpipe;nlohmann;gloo;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
--   Public CUDA Deps.    : 
--   Private CUDA Deps.   : caffe2::curand;caffe2::cufft;caffe2::cublas;torch::cudnn;__caffe2_nccl;tensorpipe_cuda;gloo_cuda;/usr/local/cuda/lib64/libcudart.so;CUDA::cusparse;CUDA::cufft;ATEN_CUDA_FILES_GEN_LIB
--   USE_COREML_DELEGATE     : OFF
--   BUILD_LAZY_TS_BACKEND   : ON
--   USE_ROCM_KERNEL_ASSERT : OFF
-- Performing Test HAS_WMISSING_PROTOTYPES
-- Performing Test HAS_WMISSING_PROTOTYPES - Failed
-- Performing Test HAS_WERROR_MISSING_PROTOTYPES
-- Performing Test HAS_WERROR_MISSING_PROTOTYPES - Failed
-- Configuring done (48.7s)
-- Generating done (2.7s)
CMake Warning:
  Manually-specified variables were not used by the project:

[6399/8810] Building CUDA object c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_1_var_test.dir/impl/CUDAAssertionsTest_1_var_test.cu.o
FAILED: c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_1_var_test.dir/impl/CUDAAssertionsTest_1_var_test.cu.o 
/usr/bin/nvcc -forward-unknown-to-host-compiler -DFLASHATTENTION_DISABLE_ALIBI -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -I/home/handrianomena/research/pkge/pytorch/build/aten/src -I/home/handrianomena/research/pkge/pytorch/aten/src -I/home/handrianomena/research/pkge/pytorch/build -I/home/handrianomena/research/pkge/pytorch -I/home/handrianomena/research/pkge/pytorch/cmake/../third_party/benchmark/include -I/home/handrianomena/research/pkge/pytorch/third_party/onnx -I/home/handrianomena/research/pkge/pytorch/build/third_party/onnx -I/home/handrianomena/research/pkge/pytorch/third_party/foxi -I/home/handrianomena/research/pkge/pytorch/build/third_party/foxi -I/home/handrianomena/research/pkge/pytorch/nlohmann -I/home/handrianomena/research/pkge/pytorch/c10/cuda/../.. -I/home/handrianomena/research/pkge/pytorch/c10/.. -isystem /home/handrianomena/research/pkge/pytorch/build/third_party/gloo -isystem /home/handrianomena/research/pkge/pytorch/cmake/../third_party/gloo -isystem /home/handrianomena/research/pkge/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /home/handrianomena/research/pkge/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/handrianomena/research/pkge/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/handrianomena/research/pkge/pytorch/third_party/protobuf/src -isystem /home/handrianomena/anaconda3/envs/dl-env/include -isystem /home/handrianomena/research/pkge/pytorch/third_party/XNNPACK/include -isystem /home/handrianomena/research/pkge/pytorch/third_party/ittapi/include -isystem /home/handrianomena/research/pkge/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /home/handrianomena/research/pkge/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /home/handrianomena/research/pkge/pytorch/third_party/ideep/include -isystem /home/handrianomena/research/pkge/pytorch/INTERFACE -isystem /home/handrianomena/research/pkge/pytorch/third_party/nlohmann/include -isystem /home/handrianomena/research/pkge/pytorch/third_party/googletest/googletest/include -isystem /home/handrianomena/research/pkge/pytorch/third_party/googletest/googletest -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_50,code=sm_50 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda  -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -std=c++17 -Xcompiler=-fPIE -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -MD -MT c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_1_var_test.dir/impl/CUDAAssertionsTest_1_var_test.cu.o -MF c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_1_var_test.dir/impl/CUDAAssertionsTest_1_var_test.cu.o.d -x cu -c /home/handrianomena/research/pkge/pytorch/c10/cuda/test/impl/CUDAAssertionsTest_1_var_test.cu -o c10/cuda/test/CMakeFiles/c10_cuda_CUDAAssertionsTest_1_var_test.dir/impl/CUDAAssertionsTest_1_var_test.cu.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'

This is the output of nvcc -V:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

and that of nvidia-smi:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX TITAN X     On  | 00000000:03:00.0 Off |                  N/A |
| 22%   28C    P8              15W / 250W |      1MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce GTX TITAN X     On  | 00000000:82:00.0 Off |                  N/A |
| 22%   27C    P8              16W / 250W |      1MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Can someone please assist? I was trying to install a few versions of pytorch with gpu using pip and conda too but was not successful either.

Many thanks.

You are mixing different CUDA toolkits in your setup (11.5 with 12.1).
Either fix your CUDA setup making sure only one compiler is detected during the build or just install the PyTorch binaries (stable or nightly) if you don’t want to change specific code in PyTorch and thus wouldn’t need a source build.

Thanks. I first updated my .bashrc using:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64

then installed pytorch binaries via pip. Now torch.cuda.is_available() = True, however when moving a tensor to cuda, it complains:

RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

is my CUDA setup not fixed properly?

Thanks

Your locally installed CUDA toolkit won’t be used if you install the PyTorch binaries as these ship with their own CUDA runtime dependency.
The issue points potentially towards a driver issue. You could try to compile any CUDA sample to make sure your setup is able to utilize your GPU.

Thanks. I will have a look.