Build from source for sm_35 failed

Hello,this is my first post in here. if im doing something wrong, please tell me.
i would like to use my two K40c gpus with pytorch but keep failing to build from source.
Here is my pc info.

os: Ubuntu 18.04
nvidia-driver: nvidia-driver-450-server
cuda: 11.0
cuDNN: not installed
cmake: 3.26.4
g++: 9.4.0

I followed the instruction of the official github repo.
i ran the following commands

conda install cmake ninja
conda install mkl mkl-include
conda install -c pytorch magma-cuda110  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
TORCH_CUDA_ARCH_LIST="3.5" python setup.py install

and i got this

(test2) moheji@ubuntu:~/pytorch$ TORCH_CUDA_ARCH_LIST="3.5" python setup.py install
Building wheel torch-2.1.0a0+git849fbc6
-- Building version 2.1.0a0+git849fbc6
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/moheji/pytorch/torch -DCMAKE_PREFIX_PATH=/home/moheji/anaconda3/envs/test2/lib/python3.8/site-packages;/home/moheji/anaconda3/envs/test2 -DNUMPY_INCLUDE_DIR=/home/moheji/anaconda3/envs/test2/lib/python3.8/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/home/moheji/anaconda3/envs/test2/bin/python -DPYTHON_INCLUDE_DIR=/home/moheji/anaconda3/envs/test2/include/python3.8 -DPYTHON_LIBRARY=/home/moheji/anaconda3/envs/test2/lib/libpython3.8.so.1.0 -DTORCH_BUILD_VERSION=2.1.0a0+git849fbc6 -DTORCH_CUDA_ARCH_LIST=3.5 -DUSE_NUMPY=True -DUSE_ROCM=0 /home/moheji/pytorch
-- The CXX compiler identification is GNU 9.4.0
-- The C compiler identification is GNU 9.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- /usr/bin/c++ /home/moheji/pytorch/torch/abi-check.cpp -o /home/moheji/pytorch/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
-- Not forcing any particular BLAS to be found
-- Could not find ccache. Consider installing ccache to speed up compilation.
-- Performing Test C_HAS_AVX_1 - Failed
-- Performing Test C_HAS_AVX2_1 - Failed
-- Performing Test C_HAS_AVX512_1 - Failed
-- Performing Test C_HAS_AVX512_2 - Failed
-- Performing Test C_HAS_AVX512_3 - Failed
-- Performing Test CXX_HAS_AVX_1 - Failed
-- Performing Test CXX_HAS_AVX2_1 - Failed
-- Performing Test CXX_HAS_AVX512_1 - Failed
-- Performing Test CXX_HAS_AVX512_2 - Failed
-- Performing Test CXX_HAS_AVX512_3 - Failed
-- Current compiler supports avx2 extension. Will build perfkernels.
-- Current compiler supports avx512f extension. Will build fbgemm.
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Success
-- Found CUDA: /usr/local/cuda (found version "11.0") 
-- The CUDA compiler identification is NVIDIA 11.0.194
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.0.194") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Found Threads: TRUE  
-- Caffe2: CUDA detected: 11.0
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 11.0
-- /usr/local/cuda/lib64/libnvrtc.so shorthash is b31c2d61
-- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:251 (message):
  Cannot find cuDNN library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:722 (include)


-- Could NOT find CUSPARSELT (missing: CUSPARSELT_LIBRARY_PATH CUSPARSELT_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:276 (message):
  Cannot find cuSPARSELt library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:722 (include)


-- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
-- 
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found PythonInterp: /home/moheji/anaconda3/envs/test2/bin/python (found version "3.8.17") 
-- NNPACK backend is x86-64
-- Found Python: /home/moheji/anaconda3/envs/test2/bin/python3.8 (found version "3.8.17") found components: Interpreter 
-- Failed to find LLVM FileCheck
-- Found Git: /usr/bin/git (found version "2.17.1") 
-- git version: v1.6.1 normalized to 1.6.1
-- Version: 1.6.1
-- Looking for shm_open in rt - found
-- Performing Test HAVE_CXX_FLAG_WSHORTEN_64_TO_32 - Failed
-- Performing Test HAVE_CXX_FLAG_WD654 - Failed
-- Performing Test HAVE_CXX_FLAG_WTHREAD_SAFETY - Failed
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
CMake Warning (dev) at /home/moheji/anaconda3/envs/test2/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:438 (message):
  The package name passed to `find_package_handle_standard_args` (OpenMP_C)
  does not match the name of the calling package (OpenMP).  This can lead to
  problems in calling code that expects `find_package` result variables
  (e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
  cmake/Modules/FindOpenMP.cmake:584 (find_package_handle_standard_args)
  third_party/fbgemm/CMakeLists.txt:129 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.


CMake Warning at third_party/fbgemm/CMakeLists.txt:227 (message):
  CMAKE_CXX_FLAGS_RELEASE is -O3 -DNDEBUG


CMake Warning at third_party/fbgemm/CMakeLists.txt:228 (message):
  ==========

** AsmJit Summary **
   ASMJIT_DIR=/home/moheji/pytorch/third_party/fbgemm/third_party/asmjit
   ASMJIT_TEST=FALSE
   ASMJIT_TARGET_TYPE=STATIC
   ASMJIT_DEPS=pthread;rt
   ASMJIT_LIBS=asmjit;pthread;rt
   ASMJIT_CFLAGS=-DASMJIT_STATIC
   ASMJIT_PRIVATE_CFLAGS=-Wall;-Wextra;-Wconversion;-fno-math-errno;-fno-threadsafe-statics;-fno-semantic-interposition;-DASMJIT_STATIC
   ASMJIT_PRIVATE_CFLAGS_DBG=
   ASMJIT_PRIVATE_CFLAGS_REL=-O2;-fmerge-all-constants;-fno-enforce-eh-specs
-- Could NOT find Numa (missing: Numa_INCLUDE_DIR Numa_LIBRARIES) 
CMake Warning at cmake/Dependencies.cmake:903 (message):
  Not compiling with NUMA.  Suppress this warning with -DUSE_NUMA=OFF
Call Stack (most recent call first):
  CMakeLists.txt:722 (include)



-- Adding OpenMP CXX_FLAGS: -fopenmp
-- Will link against OpenMP libraries: /usr/lib/gcc/x86_64-linux-gnu/9/libgomp.so;/usr/lib/x86_64-linux-gnu/libpthread.so
CMake Warning at cmake/External/nccl.cmake:70 (message):
  Enabling NCCL library slimming
Call Stack (most recent call first):
  cmake/Dependencies.cmake:1348 (include)
  CMakeLists.txt:722 (include)


-- Found CUB: /usr/local/cuda/include  
-- Converting CMAKE_CUDA_FLAGS to CUDA_NVCC_FLAGS:
    CUDA_NVCC_FLAGS                = -D_GLIBCXX_USE_CXX11_ABI=1;-Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_35,code=sm_35;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl;--expt-relaxed-constexpr;--expt-extended-lambda
    CUDA_NVCC_FLAGS_DEBUG          = -g
    CUDA_NVCC_FLAGS_RELEASE        = -O3;-DNDEBUG
    CUDA_NVCC_FLAGS_RELWITHDEBINFO = -O2;-g;-DNDEBUG
    CUDA_NVCC_FLAGS_MINSIZEREL     = -O1;-DNDEBUG
-- Performing Test UV_LINT_W4
-- Performing Test UV_LINT_W4 - Failed
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER_MSVC - Failed
-- Performing Test UV_LINT_NO_CONDITIONAL_CONSTANT_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_EMPTY_TU_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_FILE_SCOPE_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_NONSTATIC_DLIMPORT_MSVC - Failed
-- Performing Test UV_LINT_NO_HIDES_LOCAL - Failed
-- Performing Test UV_LINT_NO_HIDES_PARAM - Failed
-- Performing Test UV_LINT_NO_HIDES_GLOBAL - Failed
-- Performing Test UV_LINT_NO_CONDITIONAL_ASSIGNMENT_MSVC - Failed
-- Performing Test UV_LINT_NO_UNSAFE_MSVC - Failed
-- Performing Test UV_LINT_UTF8_MSVC - Failed
-- summary of build options:
    Install prefix:  /home/moheji/pytorch/torch
    Target system:   Linux
    Compiler:
      C compiler:    /usr/bin/cc
      CFLAGS:         

  For compatibility, CMake is ignoring the variable.
Call Stack (most recent call first):
  third_party/gloo/cmake/Dependencies.cmake:115 (include)
  third_party/gloo/CMakeLists.txt:111 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found CUDAToolkit: /usr/local/cuda/include (found suitable version "11.0.194", minimum required is "7.0") 
-- CUDA detected: 11.0.194
CMake Warning at cmake/Dependencies.cmake:1492 (message):
  Metal is only used in ios builds.
Call Stack (most recent call first):
  CMakeLists.txt:722 (include)


-- 
-- ******** Summary ********
-- General:
--   CMake version         : 3.26.4
--   CMake command         : /home/moheji/anaconda3/envs/test2/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 9.4.0
--   Using ccache if found : ON
--   Found ccache          : CCACHE_PROGRAM-NOTFOUND
--   CXX flags             :  -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;IDEEP_USE_MKL;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS;BUILD_NVFUSER
--   CMAKE_PREFIX_PATH     : /home/moheji/anaconda3/envs/test2/lib/python3.8/site-packages;/home/moheji/anaconda3/envs/test2;/usr/local/cuda;/usr/local/cuda
--   CMAKE_INSTALL_PREFIX  : /home/moheji/pytorch/torch
--   USE_GOLD_LINKER       : OFF
-- 
--   TORCH_VERSION         : 2.1.0
--   BUILD_CAFFE2          : OFF
--   BUILD_CAFFE2_OPS      : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_NVFUSER_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : ON
--     Link local protobuf : ON
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : True
--     Python version      : 3.8.17
--     Python executable   : /home/moheji/anaconda3/envs/test2/bin/python
--     Pythonlibs version  : 3.8.17
--     Python library      : /home/moheji/anaconda3/envs/test2/lib/libpython3.8.so.1.0
--     Python includes     : /home/moheji/anaconda3/envs/test2/include/python3.8
--     Python site-packages: lib/python3.8/site-packages
--   BUILD_SHARED_LIBS     : ON
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : OFF
--   BUILD_TEST            : True
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   BUILD_LITE_INTERPRETER: OFF
--   INTERN_BUILD_MOBILE   : 
--   TRACING_BASED         : OFF
--   USE_BLAS              : 1
--     BLAS                : mkl
--     BLAS_HAS_SBGEMM     : 
--   USE_LAPACK            : 1
--     LAPACK              : mkl
--   USE_ASAN              : OFF
--   USE_TSAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : ON
--     Split CUDA          : 
--     CUDA static link    : OFF
--     USE_CUDNN           : OFF
--     USE_EXPERIMENTAL_CUDNN_V8_API: ON
--     USE_CUSPARSELT      : OFF
--     CUDA version        : 11.0
--     USE_FLASH_ATTENTION : OFF
--     CUDA root directory : /usr/local/cuda
--     CUDA library        : /usr/lib/x86_64-linux-gnu/libcuda.so
--     cudart library      : /usr/local/cuda/lib64/libcudart.so
--     cublas library      : /usr/local/cuda/lib64/libcublas.so
--     cufft library       : /usr/local/cuda/lib64/libcufft.so
--     curand library      : /usr/local/cuda/lib64/libcurand.so
--     cusparse library    : /usr/local/cuda/lib64/libcusparse.so
--     nvrtc               : /usr/local/cuda/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda/include
--     NVCC executable     : /usr/local/cuda/bin/nvcc
--     CUDA compiler       : /usr/local/cuda/bin/nvcc
--     CUDA flags          :  -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_35,code=sm_35 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda  -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__
--     CUDA host compiler  : 
--     CUDA --device-c     : OFF
--     USE_TENSORRT        : OFF
--   USE_ROCM              : 0
--   BUILD_NVFUSER         : ON
--   USE_EIGEN_FOR_BLAS    : 
--   USE_FBGEMM            : ON
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : ON
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_PYTORCH_METAL_EXPORT     : OFF
--   USE_MPS               : OFF
--   USE_FFTW              : OFF
--   USE_MKL               : ON
--   USE_MKLDNN            : ON
--   USE_MKLDNN_ACL        : OFF
--   USE_MKLDNN_CBLAS      : OFF
--   USE_UCC               : OFF
--   USE_ITT               : ON
--   USE_NCCL              : ON
--     USE_SYSTEM_NCCL     : OFF
--     USE_NCCL_WITH_UCC   : OFF
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_MIMALLOC          : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : ON
--   USE_PYTORCH_QNNPACK   : ON
--   USE_XNNPACK           : ON
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI               : OFF
--     USE_GLOO              : ON
--     USE_GLOO_WITH_OPENSSL : OFF
--     USE_TENSORPIPE        : ON
--   Public Dependencies  : caffe2::mkl
--   Private Dependencies : Threads::Threads;pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fbgemm;ittnotify;fp16;caffe2::openmp;tensorpipe;gloo;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
--   Public CUDA Deps.    : caffe2::cufft;caffe2::curand;caffe2::cublas
--   Private CUDA Deps.   : __caffe2_nccl;tensorpipe_cuda;gloo_cuda;/usr/local/cuda/lib64/libcudart.so;CUDA::cusparse;CUDA::curand;CUDA::cufft;ATEN_CUDA_FILES_GEN_LIB
--   USE_COREML_DELEGATE     : OFF
--   BUILD_LAZY_TS_BACKEND   : ON
--   TORCH_DISABLE_GPU_ASSERTS : OFF
-- Performing Test HAS_WMISSING_PROTOTYPES
-- Performing Test HAS_WMISSING_PROTOTYPES - Failed
-- Performing Test HAS_WERROR_MISSING_PROTOTYPES
-- Performing Test HAS_WERROR_MISSING_PROTOTYPES - Failed
-- Configuring done (76.9s)
CMake Warning at caffe2/CMakeLists.txt:813 (add_library):
  Cannot generate a safe runtime search path for target torch_cpu because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libgomp.so.1] in /usr/lib/gcc/x86_64-linux-gnu/9 may be hidden by files in:
      /home/moheji/anaconda3/envs/test2/lib

  Some of these libraries may not be found correctly.


-- Generating done (2.3s)
-- Build files have been written to: /home/moheji/pytorch/build
cmake --build . --target install --config Release
[3/4] Generating ATen sources
[31/6953] Building CXX object third_party/protobuf...-lite.dir/__/src/google/protobuf/message_lite.cc.o
In file included from /usr/include/string.h:494,
generating /home/moheji/pytorch/build/third_party/onnx/onnx/onnx_data_pb.py
[4564/6953] Building CXX object third_party/kineto...Files/kineto_base.dir/src/DaemonConfigLoader.cpp.o
In file included from /home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:12,
                 from /home/moheji/pytorch/third_party/kineto/libkineto/src/IpcFabricConfigClient.h:21,
                 from /home/moheji/pytorch/third_party/kineto/libkineto/src/DaemonConfigLoader.cpp:16:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘bool dynolog::ipcfabric::EndPoint<kMaxNumFds>::tryPeekMsg(dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt&) [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:155:44:   required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:174:50: warning: throw will always call terminate() [-Wterminate]
  174 |     throw std::runtime_error(std::strerror(errno));
      |                                                  ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘const char* dynolog::ipcfabric::EndPoint<kMaxNumFds>::getName(const TCtxt&) const [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:170:54:   required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:187:66: warning: throw will always call terminate() [-Wterminate]
  187 |             ". Expected to start with " + std::string(socket_dir));
      |                                                                  ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:194:48: warning: throw will always call terminate() [-Wterminate]
  194 |             std::string(ctxt.msg_name.sun_path));
      |                                                ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘bool dynolog::ipcfabric::EndPoint<kMaxNumFds>::tryRcvMsg(dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt&) [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:178:45:   required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:160:50: warning: throw will always call terminate() [-Wterminate]
  160 |     throw std::runtime_error(std::strerror(errno));
      |                                                  ^
[4574/6953] Building CXX object third_party/kineto...es/kineto_base.dir/src/IpcFabricConfigClient.cpp.o
In file included from /home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:12,
                 from /home/moheji/pytorch/third_party/kineto/libkineto/src/IpcFabricConfigClient.h:21,
                 from /home/moheji/pytorch/third_party/kineto/libkineto/src/IpcFabricConfigClient.cpp:11:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘bool dynolog::ipcfabric::EndPoint<kMaxNumFds>::tryPeekMsg(dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt&) [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:155:44:   required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:174:50: warning: throw will always call terminate() [-Wterminate]
  174 |     throw std::runtime_error(std::strerror(errno));
      |                                                  ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘const char* dynolog::ipcfabric::EndPoint<kMaxNumFds>::getName(const TCtxt&) const [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:170:54:   required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:187:66: warning: throw will always call terminate() [-Wterminate]
  187 |             ". Expected to start with " + std::string(socket_dir));
      |                                                                  ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:194:48: warning: throw will always call terminate() [-Wterminate]
  194 |             std::string(ctxt.msg_name.sun_path));
      |                                                ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘bool dynolog::ipcfabric::EndPoint<kMaxNumFds>::tryRcvMsg(dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt&) [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:178:45:   required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:160:50: warning: throw will always call terminate() [-Wterminate]
  160 |     throw std::runtime_error(std::strerror(errno));
      |                                                  ^
[4717/6953] Generating include/renameavx512fnofma.h
Generating renameavx512fnofma.h: mkrename cinz_ 8 16 avx512fnofma
[4723/6953] Generating include/renameavx512f.h
Generating renameavx512f.h: mkrename finz_ 8 16 avx512f
[4726/6953] Generating include/renameavx2.h
Generating renameavx2.h: mkrename finz_ 4 8 avx2
[4727/6953] Generating include/renameavx2128.h
Generating renameavx2128.h: mkrename finz_ 2 4 avx2128
[4764/6953] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o
FAILED: c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o 
/usr/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dc10_cuda_EXPORTS -I/home/moheji/pytorch/build/aten/src -I/home/moheji/pytorch/aten/src -I/home/moheji/pytorch/build -I/home/moheji/pytorch -I/home/moheji/pytorch/cmake/../third_party/benchmark/include -I/home/moheji/pytorch/third_party/onnx -I/home/moheji/pytorch/build/third_party/onnx -I/home/moheji/pytorch/third_party/foxi -I/home/moheji/pytorch/build/third_party/foxi -I/home/moheji/pytorch/c10/cuda/../.. -I/home/moheji/pytorch/c10/.. -isystem /home/moheji/pytorch/build/third_party/gloo -isystem /home/moheji/pytorch/cmake/../third_party/gloo -isystem /home/moheji/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /home/moheji/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/moheji/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/moheji/pytorch/third_party/protobuf/src -isystem /home/moheji/anaconda3/envs/test2/include -isystem /home/moheji/pytorch/third_party/gemmlowp -isystem /home/moheji/pytorch/third_party/neon2sse -isystem /home/moheji/pytorch/third_party/XNNPACK/include -isystem /home/moheji/pytorch/third_party/ittapi/include -isystem /home/moheji/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /home/moheji/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem /home/moheji/pytorch/third_party/ideep/include -isystem /home/moheji/pytorch/third_party/ideep/mkl-dnn/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -DC10_CUDA_BUILD_MAIN_LIB -fvisibility=hidden -DPYTORCH_C10_DRIVER_API_SUPPORTED -MD -MT c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o -MF c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o.d -o c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o -c /home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp: In function ‘std::string c10::cuda::CUDACachingAllocator::reportProcessMemoryInfo(int)’:
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1140:15: error: ‘nvmlProcessInfo_v1_t’ was not declared in this scope; did you mean ‘nvmlProcessInfo_t’?
 1140 |   std::vector<nvmlProcessInfo_v1_t> procs(8);
      |               ^~~~~~~~~~~~~~~~~~~~
      |               nvmlProcessInfo_t
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1140:35: error: template argument 1 is invalid
 1140 |   std::vector<nvmlProcessInfo_v1_t> procs(8);
      |                                   ^
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1140:35: error: template argument 2 is invalid
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1141:29: error: request for member ‘size’ in ‘procs’, which is of non-class type ‘int’
 1141 |   unsigned int size = procs.size();
      |                             ^~~~
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1144:41: error: request for member ‘data’ in ‘procs’, which is of non-class type ‘int’
 1144 |               nvml_device, &size, procs.data())) ==
      |                                         ^~~~
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1146:11: error: request for member ‘resize’ in ‘procs’, which is of non-class type ‘int’
 1146 |     procs.resize(size);
      |           ^~~~~~
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1153:25: error: invalid types ‘int[unsigned int]’ for array subscript
 1153 |     auto& proc = procs[i];
      |                         ^
At global scope:
cc1plus: warning: unrecognized command line option ‘-Wno-aligned-allocation-unavailable’
cc1plus: warning: unrecognized command line option ‘-Wno-unused-private-field’
cc1plus: warning: unrecognized command line option ‘-Wno-invalid-partial-specialization’
[4793/6953] Performing build step for 'nccl_external'
make -C src build BUILDDIR=/home/moheji/pytorch/build/nccl
make[1]: Entering directory '/home/moheji/pytorch/third_party/nccl/nccl/src'
NVCC_GENCODE is -gencode=arch=compute_35,code=sm_35
Grabbing   include/nccl_net.h                  > /home/moheji/pytorch/build/nccl/include/nccl_net.h
Generating nccl.pc.in                          > /home/moheji/pytorch/build/nccl/lib/pkgconfig/nccl.pc
Generating nccl.h.in                           > /home/moheji/pytorch/build/nccl/include/nccl.h
Compiling  init.cc                             > /home/moheji/pytorch/build/nccl/obj/init.oecv_sum_u64.o
Compiling  sendrecv.cu                         > /home/moheji/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_f64.o
Compiling  sendrecv.cu                         > /home/moheji/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_f16.o
Compiling  sendrecv.cu                         > /home/moheji/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_f32.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Archiving  objects                             > /home/moheji/pytorch/build/nccl/obj/collectives/device/colldevice.a
make[2]: Leaving directory '/home/moheji/pytorch/third_party/nccl/nccl/src/collectives/device'
Linking    libnccl.so.2.18.3                   > /home/moheji/pytorch/build/nccl/lib/libnccl.so.2.18.3
Archiving  libnccl_static.a                    > /home/moheji/pytorch/build/nccl/lib/libnccl_static.a
make[1]: Leaving directory '/home/moheji/pytorch/third_party/nccl/nccl/src'
ninja: build stopped: subcommand failed.
(test2) moheji@ubuntu:~/pytorch$ 

What am i supposed to do? :sob:

You might need to update your driver to 470+, if I’m not mistaken.