Hello,this is my first post in here. if im doing something wrong, please tell me.
i would like to use my two K40c gpus with pytorch but keep failing to build from source.
Here is my pc info.
os: Ubuntu 18.04
nvidia-driver: nvidia-driver-450-server
cuda: 11.0
cuDNN: not installed
cmake: 3.26.4
g++: 9.4.0
I followed the instruction of the official github repo.
i ran the following commands
conda install cmake ninja
conda install mkl mkl-include
conda install -c pytorch magma-cuda110 # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
TORCH_CUDA_ARCH_LIST="3.5" python setup.py install
and i got this
(test2) moheji@ubuntu:~/pytorch$ TORCH_CUDA_ARCH_LIST="3.5" python setup.py install
Building wheel torch-2.1.0a0+git849fbc6
-- Building version 2.1.0a0+git849fbc6
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/moheji/pytorch/torch -DCMAKE_PREFIX_PATH=/home/moheji/anaconda3/envs/test2/lib/python3.8/site-packages;/home/moheji/anaconda3/envs/test2 -DNUMPY_INCLUDE_DIR=/home/moheji/anaconda3/envs/test2/lib/python3.8/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/home/moheji/anaconda3/envs/test2/bin/python -DPYTHON_INCLUDE_DIR=/home/moheji/anaconda3/envs/test2/include/python3.8 -DPYTHON_LIBRARY=/home/moheji/anaconda3/envs/test2/lib/libpython3.8.so.1.0 -DTORCH_BUILD_VERSION=2.1.0a0+git849fbc6 -DTORCH_CUDA_ARCH_LIST=3.5 -DUSE_NUMPY=True -DUSE_ROCM=0 /home/moheji/pytorch
-- The CXX compiler identification is GNU 9.4.0
-- The C compiler identification is GNU 9.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- /usr/bin/c++ /home/moheji/pytorch/torch/abi-check.cpp -o /home/moheji/pytorch/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
-- Not forcing any particular BLAS to be found
-- Could not find ccache. Consider installing ccache to speed up compilation.
-- Performing Test C_HAS_AVX_1 - Failed
-- Performing Test C_HAS_AVX2_1 - Failed
-- Performing Test C_HAS_AVX512_1 - Failed
-- Performing Test C_HAS_AVX512_2 - Failed
-- Performing Test C_HAS_AVX512_3 - Failed
-- Performing Test CXX_HAS_AVX_1 - Failed
-- Performing Test CXX_HAS_AVX2_1 - Failed
-- Performing Test CXX_HAS_AVX512_1 - Failed
-- Performing Test CXX_HAS_AVX512_2 - Failed
-- Performing Test CXX_HAS_AVX512_3 - Failed
-- Current compiler supports avx2 extension. Will build perfkernels.
-- Current compiler supports avx512f extension. Will build fbgemm.
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Success
-- Found CUDA: /usr/local/cuda (found version "11.0")
-- The CUDA compiler identification is NVIDIA 11.0.194
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.0.194")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Found Threads: TRUE
-- Caffe2: CUDA detected: 11.0
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 11.0
-- /usr/local/cuda/lib64/libnvrtc.so shorthash is b31c2d61
-- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH)
CMake Warning at cmake/public/cuda.cmake:251 (message):
Cannot find cuDNN library. Turning the option off
Call Stack (most recent call first):
cmake/Dependencies.cmake:44 (include)
CMakeLists.txt:722 (include)
-- Could NOT find CUSPARSELT (missing: CUSPARSELT_LIBRARY_PATH CUSPARSELT_INCLUDE_PATH)
CMake Warning at cmake/public/cuda.cmake:276 (message):
Cannot find cuSPARSELt library. Turning the option off
Call Stack (most recent call first):
cmake/Dependencies.cmake:44 (include)
CMakeLists.txt:722 (include)
-- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
--
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found PythonInterp: /home/moheji/anaconda3/envs/test2/bin/python (found version "3.8.17")
-- NNPACK backend is x86-64
-- Found Python: /home/moheji/anaconda3/envs/test2/bin/python3.8 (found version "3.8.17") found components: Interpreter
-- Failed to find LLVM FileCheck
-- Found Git: /usr/bin/git (found version "2.17.1")
-- git version: v1.6.1 normalized to 1.6.1
-- Version: 1.6.1
-- Looking for shm_open in rt - found
-- Performing Test HAVE_CXX_FLAG_WSHORTEN_64_TO_32 - Failed
-- Performing Test HAVE_CXX_FLAG_WD654 - Failed
-- Performing Test HAVE_CXX_FLAG_WTHREAD_SAFETY - Failed
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
CMake Warning (dev) at /home/moheji/anaconda3/envs/test2/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:438 (message):
The package name passed to `find_package_handle_standard_args` (OpenMP_C)
does not match the name of the calling package (OpenMP). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/Modules/FindOpenMP.cmake:584 (find_package_handle_standard_args)
third_party/fbgemm/CMakeLists.txt:129 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning at third_party/fbgemm/CMakeLists.txt:227 (message):
CMAKE_CXX_FLAGS_RELEASE is -O3 -DNDEBUG
CMake Warning at third_party/fbgemm/CMakeLists.txt:228 (message):
==========
** AsmJit Summary **
ASMJIT_DIR=/home/moheji/pytorch/third_party/fbgemm/third_party/asmjit
ASMJIT_TEST=FALSE
ASMJIT_TARGET_TYPE=STATIC
ASMJIT_DEPS=pthread;rt
ASMJIT_LIBS=asmjit;pthread;rt
ASMJIT_CFLAGS=-DASMJIT_STATIC
ASMJIT_PRIVATE_CFLAGS=-Wall;-Wextra;-Wconversion;-fno-math-errno;-fno-threadsafe-statics;-fno-semantic-interposition;-DASMJIT_STATIC
ASMJIT_PRIVATE_CFLAGS_DBG=
ASMJIT_PRIVATE_CFLAGS_REL=-O2;-fmerge-all-constants;-fno-enforce-eh-specs
-- Could NOT find Numa (missing: Numa_INCLUDE_DIR Numa_LIBRARIES)
CMake Warning at cmake/Dependencies.cmake:903 (message):
Not compiling with NUMA. Suppress this warning with -DUSE_NUMA=OFF
Call Stack (most recent call first):
CMakeLists.txt:722 (include)
-- Adding OpenMP CXX_FLAGS: -fopenmp
-- Will link against OpenMP libraries: /usr/lib/gcc/x86_64-linux-gnu/9/libgomp.so;/usr/lib/x86_64-linux-gnu/libpthread.so
CMake Warning at cmake/External/nccl.cmake:70 (message):
Enabling NCCL library slimming
Call Stack (most recent call first):
cmake/Dependencies.cmake:1348 (include)
CMakeLists.txt:722 (include)
-- Found CUB: /usr/local/cuda/include
-- Converting CMAKE_CUDA_FLAGS to CUDA_NVCC_FLAGS:
CUDA_NVCC_FLAGS = -D_GLIBCXX_USE_CXX11_ABI=1;-Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_35,code=sm_35;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl;--expt-relaxed-constexpr;--expt-extended-lambda
CUDA_NVCC_FLAGS_DEBUG = -g
CUDA_NVCC_FLAGS_RELEASE = -O3;-DNDEBUG
CUDA_NVCC_FLAGS_RELWITHDEBINFO = -O2;-g;-DNDEBUG
CUDA_NVCC_FLAGS_MINSIZEREL = -O1;-DNDEBUG
-- Performing Test UV_LINT_W4
-- Performing Test UV_LINT_W4 - Failed
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER_MSVC - Failed
-- Performing Test UV_LINT_NO_CONDITIONAL_CONSTANT_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_EMPTY_TU_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_FILE_SCOPE_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_NONSTATIC_DLIMPORT_MSVC - Failed
-- Performing Test UV_LINT_NO_HIDES_LOCAL - Failed
-- Performing Test UV_LINT_NO_HIDES_PARAM - Failed
-- Performing Test UV_LINT_NO_HIDES_GLOBAL - Failed
-- Performing Test UV_LINT_NO_CONDITIONAL_ASSIGNMENT_MSVC - Failed
-- Performing Test UV_LINT_NO_UNSAFE_MSVC - Failed
-- Performing Test UV_LINT_UTF8_MSVC - Failed
-- summary of build options:
Install prefix: /home/moheji/pytorch/torch
Target system: Linux
Compiler:
C compiler: /usr/bin/cc
CFLAGS:
For compatibility, CMake is ignoring the variable.
Call Stack (most recent call first):
third_party/gloo/cmake/Dependencies.cmake:115 (include)
third_party/gloo/CMakeLists.txt:111 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found CUDAToolkit: /usr/local/cuda/include (found suitable version "11.0.194", minimum required is "7.0")
-- CUDA detected: 11.0.194
CMake Warning at cmake/Dependencies.cmake:1492 (message):
Metal is only used in ios builds.
Call Stack (most recent call first):
CMakeLists.txt:722 (include)
--
-- ******** Summary ********
-- General:
-- CMake version : 3.26.4
-- CMake command : /home/moheji/anaconda3/envs/test2/bin/cmake
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- C++ compiler id : GNU
-- C++ compiler version : 9.4.0
-- Using ccache if found : ON
-- Found ccache : CCACHE_PROGRAM-NOTFOUND
-- CXX flags : -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow
-- Build type : Release
-- Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;IDEEP_USE_MKL;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS;BUILD_NVFUSER
-- CMAKE_PREFIX_PATH : /home/moheji/anaconda3/envs/test2/lib/python3.8/site-packages;/home/moheji/anaconda3/envs/test2;/usr/local/cuda;/usr/local/cuda
-- CMAKE_INSTALL_PREFIX : /home/moheji/pytorch/torch
-- USE_GOLD_LINKER : OFF
--
-- TORCH_VERSION : 2.1.0
-- BUILD_CAFFE2 : OFF
-- BUILD_CAFFE2_OPS : OFF
-- BUILD_STATIC_RUNTIME_BENCHMARK: OFF
-- BUILD_TENSOREXPR_BENCHMARK: OFF
-- BUILD_NVFUSER_BENCHMARK: OFF
-- BUILD_BINARY : OFF
-- BUILD_CUSTOM_PROTOBUF : ON
-- Link local protobuf : ON
-- BUILD_DOCS : OFF
-- BUILD_PYTHON : True
-- Python version : 3.8.17
-- Python executable : /home/moheji/anaconda3/envs/test2/bin/python
-- Pythonlibs version : 3.8.17
-- Python library : /home/moheji/anaconda3/envs/test2/lib/libpython3.8.so.1.0
-- Python includes : /home/moheji/anaconda3/envs/test2/include/python3.8
-- Python site-packages: lib/python3.8/site-packages
-- BUILD_SHARED_LIBS : ON
-- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF
-- BUILD_TEST : True
-- BUILD_JNI : OFF
-- BUILD_MOBILE_AUTOGRAD : OFF
-- BUILD_LITE_INTERPRETER: OFF
-- INTERN_BUILD_MOBILE :
-- TRACING_BASED : OFF
-- USE_BLAS : 1
-- BLAS : mkl
-- BLAS_HAS_SBGEMM :
-- USE_LAPACK : 1
-- LAPACK : mkl
-- USE_ASAN : OFF
-- USE_TSAN : OFF
-- USE_CPP_CODE_COVERAGE : OFF
-- USE_CUDA : ON
-- Split CUDA :
-- CUDA static link : OFF
-- USE_CUDNN : OFF
-- USE_EXPERIMENTAL_CUDNN_V8_API: ON
-- USE_CUSPARSELT : OFF
-- CUDA version : 11.0
-- USE_FLASH_ATTENTION : OFF
-- CUDA root directory : /usr/local/cuda
-- CUDA library : /usr/lib/x86_64-linux-gnu/libcuda.so
-- cudart library : /usr/local/cuda/lib64/libcudart.so
-- cublas library : /usr/local/cuda/lib64/libcublas.so
-- cufft library : /usr/local/cuda/lib64/libcufft.so
-- curand library : /usr/local/cuda/lib64/libcurand.so
-- cusparse library : /usr/local/cuda/lib64/libcusparse.so
-- nvrtc : /usr/local/cuda/lib64/libnvrtc.so
-- CUDA include path : /usr/local/cuda/include
-- NVCC executable : /usr/local/cuda/bin/nvcc
-- CUDA compiler : /usr/local/cuda/bin/nvcc
-- CUDA flags : -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_35,code=sm_35 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__
-- CUDA host compiler :
-- CUDA --device-c : OFF
-- USE_TENSORRT : OFF
-- USE_ROCM : 0
-- BUILD_NVFUSER : ON
-- USE_EIGEN_FOR_BLAS :
-- USE_FBGEMM : ON
-- USE_FAKELOWP : OFF
-- USE_KINETO : ON
-- USE_FFMPEG : OFF
-- USE_GFLAGS : OFF
-- USE_GLOG : OFF
-- USE_LEVELDB : OFF
-- USE_LITE_PROTO : OFF
-- USE_LMDB : OFF
-- USE_METAL : OFF
-- USE_PYTORCH_METAL : OFF
-- USE_PYTORCH_METAL_EXPORT : OFF
-- USE_MPS : OFF
-- USE_FFTW : OFF
-- USE_MKL : ON
-- USE_MKLDNN : ON
-- USE_MKLDNN_ACL : OFF
-- USE_MKLDNN_CBLAS : OFF
-- USE_UCC : OFF
-- USE_ITT : ON
-- USE_NCCL : ON
-- USE_SYSTEM_NCCL : OFF
-- USE_NCCL_WITH_UCC : OFF
-- USE_NNPACK : ON
-- USE_NUMPY : ON
-- USE_OBSERVERS : ON
-- USE_OPENCL : OFF
-- USE_OPENCV : OFF
-- USE_OPENMP : ON
-- USE_TBB : OFF
-- USE_MIMALLOC : OFF
-- USE_VULKAN : OFF
-- USE_PROF : OFF
-- USE_QNNPACK : ON
-- USE_PYTORCH_QNNPACK : ON
-- USE_XNNPACK : ON
-- USE_REDIS : OFF
-- USE_ROCKSDB : OFF
-- USE_ZMQ : OFF
-- USE_DISTRIBUTED : ON
-- USE_MPI : OFF
-- USE_GLOO : ON
-- USE_GLOO_WITH_OPENSSL : OFF
-- USE_TENSORPIPE : ON
-- Public Dependencies : caffe2::mkl
-- Private Dependencies : Threads::Threads;pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fbgemm;ittnotify;fp16;caffe2::openmp;tensorpipe;gloo;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
-- Public CUDA Deps. : caffe2::cufft;caffe2::curand;caffe2::cublas
-- Private CUDA Deps. : __caffe2_nccl;tensorpipe_cuda;gloo_cuda;/usr/local/cuda/lib64/libcudart.so;CUDA::cusparse;CUDA::curand;CUDA::cufft;ATEN_CUDA_FILES_GEN_LIB
-- USE_COREML_DELEGATE : OFF
-- BUILD_LAZY_TS_BACKEND : ON
-- TORCH_DISABLE_GPU_ASSERTS : OFF
-- Performing Test HAS_WMISSING_PROTOTYPES
-- Performing Test HAS_WMISSING_PROTOTYPES - Failed
-- Performing Test HAS_WERROR_MISSING_PROTOTYPES
-- Performing Test HAS_WERROR_MISSING_PROTOTYPES - Failed
-- Configuring done (76.9s)
CMake Warning at caffe2/CMakeLists.txt:813 (add_library):
Cannot generate a safe runtime search path for target torch_cpu because
files in some directories may conflict with libraries in implicit
directories:
runtime library [libgomp.so.1] in /usr/lib/gcc/x86_64-linux-gnu/9 may be hidden by files in:
/home/moheji/anaconda3/envs/test2/lib
Some of these libraries may not be found correctly.
-- Generating done (2.3s)
-- Build files have been written to: /home/moheji/pytorch/build
cmake --build . --target install --config Release
[3/4] Generating ATen sources
[31/6953] Building CXX object third_party/protobuf...-lite.dir/__/src/google/protobuf/message_lite.cc.o
In file included from /usr/include/string.h:494,
generating /home/moheji/pytorch/build/third_party/onnx/onnx/onnx_data_pb.py
[4564/6953] Building CXX object third_party/kineto...Files/kineto_base.dir/src/DaemonConfigLoader.cpp.o
In file included from /home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:12,
from /home/moheji/pytorch/third_party/kineto/libkineto/src/IpcFabricConfigClient.h:21,
from /home/moheji/pytorch/third_party/kineto/libkineto/src/DaemonConfigLoader.cpp:16:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘bool dynolog::ipcfabric::EndPoint<kMaxNumFds>::tryPeekMsg(dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt&) [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:155:44: required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:174:50: warning: throw will always call terminate() [-Wterminate]
174 | throw std::runtime_error(std::strerror(errno));
| ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘const char* dynolog::ipcfabric::EndPoint<kMaxNumFds>::getName(const TCtxt&) const [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:170:54: required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:187:66: warning: throw will always call terminate() [-Wterminate]
187 | ". Expected to start with " + std::string(socket_dir));
| ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:194:48: warning: throw will always call terminate() [-Wterminate]
194 | std::string(ctxt.msg_name.sun_path));
| ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘bool dynolog::ipcfabric::EndPoint<kMaxNumFds>::tryRcvMsg(dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt&) [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:178:45: required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:160:50: warning: throw will always call terminate() [-Wterminate]
160 | throw std::runtime_error(std::strerror(errno));
| ^
[4574/6953] Building CXX object third_party/kineto...es/kineto_base.dir/src/IpcFabricConfigClient.cpp.o
In file included from /home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:12,
from /home/moheji/pytorch/third_party/kineto/libkineto/src/IpcFabricConfigClient.h:21,
from /home/moheji/pytorch/third_party/kineto/libkineto/src/IpcFabricConfigClient.cpp:11:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘bool dynolog::ipcfabric::EndPoint<kMaxNumFds>::tryPeekMsg(dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt&) [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:155:44: required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:174:50: warning: throw will always call terminate() [-Wterminate]
174 | throw std::runtime_error(std::strerror(errno));
| ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘const char* dynolog::ipcfabric::EndPoint<kMaxNumFds>::getName(const TCtxt&) const [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:170:54: required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:187:66: warning: throw will always call terminate() [-Wterminate]
187 | ". Expected to start with " + std::string(socket_dir));
| ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:194:48: warning: throw will always call terminate() [-Wterminate]
194 | std::string(ctxt.msg_name.sun_path));
| ^
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h: In instantiation of ‘bool dynolog::ipcfabric::EndPoint<kMaxNumFds>::tryRcvMsg(dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt&) [with long unsigned int kMaxNumFds = 0; dynolog::ipcfabric::EndPoint<kMaxNumFds>::TCtxt = dynolog::ipcfabric::EndPointCtxt<0>]’:
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/FabricManager.h:178:45: required from here
/home/moheji/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric/Endpoint.h:160:50: warning: throw will always call terminate() [-Wterminate]
160 | throw std::runtime_error(std::strerror(errno));
| ^
[4717/6953] Generating include/renameavx512fnofma.h
Generating renameavx512fnofma.h: mkrename cinz_ 8 16 avx512fnofma
[4723/6953] Generating include/renameavx512f.h
Generating renameavx512f.h: mkrename finz_ 8 16 avx512f
[4726/6953] Generating include/renameavx2.h
Generating renameavx2.h: mkrename finz_ 4 8 avx2
[4727/6953] Generating include/renameavx2128.h
Generating renameavx2128.h: mkrename finz_ 2 4 avx2128
[4764/6953] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o
FAILED: c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o
/usr/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dc10_cuda_EXPORTS -I/home/moheji/pytorch/build/aten/src -I/home/moheji/pytorch/aten/src -I/home/moheji/pytorch/build -I/home/moheji/pytorch -I/home/moheji/pytorch/cmake/../third_party/benchmark/include -I/home/moheji/pytorch/third_party/onnx -I/home/moheji/pytorch/build/third_party/onnx -I/home/moheji/pytorch/third_party/foxi -I/home/moheji/pytorch/build/third_party/foxi -I/home/moheji/pytorch/c10/cuda/../.. -I/home/moheji/pytorch/c10/.. -isystem /home/moheji/pytorch/build/third_party/gloo -isystem /home/moheji/pytorch/cmake/../third_party/gloo -isystem /home/moheji/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /home/moheji/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/moheji/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/moheji/pytorch/third_party/protobuf/src -isystem /home/moheji/anaconda3/envs/test2/include -isystem /home/moheji/pytorch/third_party/gemmlowp -isystem /home/moheji/pytorch/third_party/neon2sse -isystem /home/moheji/pytorch/third_party/XNNPACK/include -isystem /home/moheji/pytorch/third_party/ittapi/include -isystem /home/moheji/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /home/moheji/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem /home/moheji/pytorch/third_party/ideep/include -isystem /home/moheji/pytorch/third_party/ideep/mkl-dnn/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -DC10_CUDA_BUILD_MAIN_LIB -fvisibility=hidden -DPYTORCH_C10_DRIVER_API_SUPPORTED -MD -MT c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o -MF c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o.d -o c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o -c /home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp: In function ‘std::string c10::cuda::CUDACachingAllocator::reportProcessMemoryInfo(int)’:
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1140:15: error: ‘nvmlProcessInfo_v1_t’ was not declared in this scope; did you mean ‘nvmlProcessInfo_t’?
1140 | std::vector<nvmlProcessInfo_v1_t> procs(8);
| ^~~~~~~~~~~~~~~~~~~~
| nvmlProcessInfo_t
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1140:35: error: template argument 1 is invalid
1140 | std::vector<nvmlProcessInfo_v1_t> procs(8);
| ^
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1140:35: error: template argument 2 is invalid
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1141:29: error: request for member ‘size’ in ‘procs’, which is of non-class type ‘int’
1141 | unsigned int size = procs.size();
| ^~~~
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1144:41: error: request for member ‘data’ in ‘procs’, which is of non-class type ‘int’
1144 | nvml_device, &size, procs.data())) ==
| ^~~~
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1146:11: error: request for member ‘resize’ in ‘procs’, which is of non-class type ‘int’
1146 | procs.resize(size);
| ^~~~~~
/home/moheji/pytorch/c10/cuda/CUDACachingAllocator.cpp:1153:25: error: invalid types ‘int[unsigned int]’ for array subscript
1153 | auto& proc = procs[i];
| ^
At global scope:
cc1plus: warning: unrecognized command line option ‘-Wno-aligned-allocation-unavailable’
cc1plus: warning: unrecognized command line option ‘-Wno-unused-private-field’
cc1plus: warning: unrecognized command line option ‘-Wno-invalid-partial-specialization’
[4793/6953] Performing build step for 'nccl_external'
make -C src build BUILDDIR=/home/moheji/pytorch/build/nccl
make[1]: Entering directory '/home/moheji/pytorch/third_party/nccl/nccl/src'
NVCC_GENCODE is -gencode=arch=compute_35,code=sm_35
Grabbing include/nccl_net.h > /home/moheji/pytorch/build/nccl/include/nccl_net.h
Generating nccl.pc.in > /home/moheji/pytorch/build/nccl/lib/pkgconfig/nccl.pc
Generating nccl.h.in > /home/moheji/pytorch/build/nccl/include/nccl.h
Compiling init.cc > /home/moheji/pytorch/build/nccl/obj/init.oecv_sum_u64.o
Compiling sendrecv.cu > /home/moheji/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_f64.o
Compiling sendrecv.cu > /home/moheji/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_f16.o
Compiling sendrecv.cu > /home/moheji/pytorch/build/nccl/obj/collectives/device/sendrecv_sum_f32.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Archiving objects > /home/moheji/pytorch/build/nccl/obj/collectives/device/colldevice.a
make[2]: Leaving directory '/home/moheji/pytorch/third_party/nccl/nccl/src/collectives/device'
Linking libnccl.so.2.18.3 > /home/moheji/pytorch/build/nccl/lib/libnccl.so.2.18.3
Archiving libnccl_static.a > /home/moheji/pytorch/build/nccl/lib/libnccl_static.a
make[1]: Leaving directory '/home/moheji/pytorch/third_party/nccl/nccl/src'
ninja: build stopped: subcommand failed.
(test2) moheji@ubuntu:~/pytorch$
What am i supposed to do?