Building from source: Getting rid of build warnings so I (hopefully) get all tests to pass

I have been trying to get a PyTorch version built from source which I can use for contributing for quite a while now.
I’m cloning the repo and running python setup.py develop inside a designated anaconda environment. The build proceeds without errors. However, my problem is that I can never get the tests in test/run_test.sh to pass.

Since I’m out of ideas, I’m currently trying to eliminate all warnings I get during the build process. The rationale is that hopefully once this works, the tests will also succeed.

Currently, I’m stuck with the following warnings for which I couldn’t find a satisfactory answer after googling. I have ommitted warnings which I think are unsubstantial for now.

CMake Warning at /home/user/miniconda3/envs/torchdev37/lib/python3.7/site-packages/pybind11/share/cmake/pybind11/pybind11Tools.cmake:19 (message):
  Set PYBIND11_PYTHON_VERSION to search for a specific version, not
  PYTHON_VERSION (which is an output).  Assuming that is what you meant to do
  and continuing anyway.

I installed pybind via pip inside my Miniconda environment. I’m not sure what I’m supposed to do here.

CMake Warning at cmake/External/nccl.cmake:62 (message):
  Objcopy version is too old to support NCCL library slimming

‘Update objcopy’ didn’t give me anything on Google. I’m on a newly set-up Ubuntu 20.04 machine.

CMake Warning (dev) at third_party/gloo/CMakeLists.txt:21 (option):
  Policy CMP0077 is not set: option() honors normal variables.  Run "cmake
  --help-policy CMP0077" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

Not sure whether this is important.

  1. (Edit: I got this one resolved.)
-- Could NOT find NCCL (missing: NCCL_INCLUDE_DIR NCCL_LIBRARY)

This one is super weird. I explicitly set these and echo $NCCL_INCLUDE_DIR gives /home/user/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/include/, and similarly for the library-path.

CMake Warning at cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake:1915 (add_executable):
  Cannot generate a safe runtime search path for target
  generate_proposals_op_gpu_test because files in some directories may
  conflict with libraries in implicit directories:

    runtime library [libnvToolsExt.so.1] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda-11.2/lib64

  Some of these libraries may not be found correctly.

I got multiple warnings of the above form, all complaining specifically about

    runtime library [libnvToolsExt.so.1] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda-11.2/lib64

I read stackoverflow threads which dealt with the error message. But those didn’t seem actionable for this case, or maybe I just didn’t understand what I’m supposed to do exactly to remedy this. Set some environment variable maybe?

I would be really grateful for help in resolving these.

For now, I was able to resolve
4.

-- Could NOT find NCCL (missing: NCCL_INCLUDE_DIR NCCL_LIBRARY)

This is a confusing warning, because setting the environment variables NCCL_INCLUDE_DIR and NCCL_LIBRARY does not resolve the issue. Instead, the solution is to set the environment variable NCCL_ROOT_DIR. In my case, I set it to /home/user/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/.

The other warnings are still there.

Are you using the base environment?
Usually, it is recommended to create different environments for different projects, to avoid conflicts between packages.
So you could create a new environment and install immediately pytorch, then all the other packages.
Nevertheless, I don’t think the problem is caused by pytorch in this case. NCLL is an NVIDIA library, so maybe you could try to uninstall and install CUDA again.
My suggestion is:
create a new environment first, install immediately pytorch, then all the other packages you need for your project, and see if it works. Otherwise, try to install also cuda again.

Check this CONDA CHEAT SHEET to see how to create a new env and activate it.

I am not using the base environment. I created a separate environment only for the PyTorch build.
Thank you for the suggestion with CUDA, but as I wrote in my last post I got rid of the NCCL warning (4. of 5). I’m now left with warnings 1., 2., 3. and 5. described in my first post.

Usually most depedndencies like PyBind should be “vendored”, i.e. in git submodules under third_party rather than installed through conda. (use git submodule update --init --recursive or so if you don’t have them).

Notable exceptions are the compute libraries (CuDNN, Magma, MKL, …) and 2-3 Python things numpy, pyyaml and one I forget.

I’m assuming you generally follow the instructions in CONTRIBUTING.md?

Best regards

Thomas

1 Like

Yes, I do follow the instructions in CONTRIBUTING.md and more specifically on how to build from source. The only thing I did differently before is that I didn’t install some of the required packages through conda, but instead used pip or apt-get. As a sanity check, I installed everything through conda now, inside a new environment with Python version 3.7.9.

However, now the build fails completely, where before I had a ‘working’ build with failing tests from test/run_tests.sh. Here is the beginning of the output of python setup.py develop minus the error messages.

Submodule path 'android/libs/fbjni': checked out 'b592c5591345a05341ed6cd31d214e71e8bf4229'
Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3'
Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1'
Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73'
Submodule path 'third_party/QNNPACK': checked out '7d2a4e9931a82adc3814275b6219a03e24e36b4c'
Submodule path 'third_party/XNNPACK': checked out '55d53a4e7079d38e90acd75dd9e4f9e781d2da35'
Submodule path 'third_party/benchmark': checked out '505be96ab23056580a3a2315abba048f4428b04e'
Submodule path 'third_party/cpuinfo': checked out '5916273f79a21551890fd3d56fc5375a78d1598d'
Submodule path 'third_party/cub': checked out 'd106ddb991a56c3df1b6d51b2409e36ba8181ce4'
Submodule path 'third_party/eigen': checked out 'd41dc4dd74acce21fb210e7625d5d135751fa9e5'
Submodule path 'third_party/fbgemm': checked out '580d6371fb4c4c606f6dcbb5b11085f5cfc73361'
Submodule path 'third_party/fbgemm/third_party/asmjit': checked out '8b35b4cffb62ecb58a903bf91cb7537d7a672211'
Submodule path 'third_party/fbgemm/third_party/cpuinfo': checked out 'ed8b86a253800bafdb7b25c5c399f91bff9cb1f3'
Submodule path 'third_party/fbgemm/third_party/googletest': checked out 'cbf019de22c8dd37b2108da35b2748fd702d1796'
Submodule path 'third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05'
Submodule path 'third_party/foxi': checked out 'bd6feb6d0d3fc903df42b4feb82a602a5fcb1fd5'
Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350'
Submodule path 'third_party/gloo': checked out '6f7095f6e9860ce4fd682a7894042e6eba0996f1'
Submodule path 'third_party/googletest': checked out '2fe3bd994b3189899d93f1d5a881e725e046fdc2'
Submodule path 'third_party/ideep': checked out 'f9468ff1a3d601b509ebe2c17d2ed0a58dffacee'
Submodule path 'third_party/ideep/mkl-dnn': checked out '98be7e8afa711dc9b66c8ff3504129cb82013cdb'
Submodule path 'third_party/ios-cmake': checked out '8abaed637d56f1337d6e1d2c4026e25c1eade724'
Submodule path 'third_party/kineto': checked out '87c2a839b63f29ad0238345ab9d8dba5fde57f91'
Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '2591ab91c3898c9f6544fff04660276537d32ffd'
Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347'
Submodule path 'third_party/nccl/nccl': checked out '033d799524fb97629af5ac2f609de367472b2696'
Submodule path 'third_party/neon2sse': checked out '97a126f08ce318023be604d03f88bf0820a9464a'
Submodule path 'third_party/onnx': checked out '54c38e6eaf557b844e70cebc00f39ced3321e9ad'
Submodule path 'third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508'
Submodule path 'third_party/onnx/third_party/pybind11': checked out '80d452484c5409444b0ec19383faa84bb7a4d351'
Submodule path 'third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
Submodule path 'third_party/onnx-tensorrt': checked out 'c153211418a7c57ce071d9ce2a41f8d1c85a878f'
Submodule path 'third_party/onnx-tensorrt/third_party/onnx': checked out '765f5ee823a67a866f4bd28a9860e81f3c811ce8'
Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508'
Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c'
Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
Submodule path 'third_party/protobuf': checked out 'd0bfd5221182da1a7cc280f3337b5e41a89539cf'
Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8'
Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081'
Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900'
Submodule path 'third_party/pthreadpool': checked out 'a134dd5d4cee80cce15db81a72e7f929d71dd413'
Submodule path 'third_party/pybind11': checked out '8de7772cc72daca8e947b79b83fea46214931604'
Submodule path 'third_party/python-enum': checked out '4cfedc426c4e2fc52e3f5c2b4297e15ed8d6b8c7'
Submodule path 'third_party/python-peachpy': checked out '07d8fde8ac45d7705129475c0f94ed8925b93473'
Submodule path 'third_party/python-six': checked out '15e31431af97e5e64b80af0a3f598d382bcdd49a'
Submodule path 'third_party/sleef': checked out 'e0a003ee838b75d11763aa9c3ef17bf71a725bff'
Submodule path 'third_party/tbb': checked out 'a51a90bc609bb73db8ea13841b5cf7aa4344d4a9'
Submodule path 'third_party/tensorpipe': checked out 'daa6e23a1f41d7a0a7227b1a0e541414da1f251d'
Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e'
Submodule path 'third_party/tensorpipe/third_party/libnop': checked out 'aa95422ea8c409e3f078d2ee7708a5f59a8b9fa2'
Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '1dff88e5161cba5c59276d2070d2e304e4dcb242'
Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef'
Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8'
-- The CXX compiler identification is GNU 9.3.0
-- The C compiler identification is GNU 9.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Not forcing any particular BLAS to be found
-- Performing Test COMPILER_WORKS
-- Performing Test COMPILER_WORKS - Success
-- Performing Test SUPPORT_GLIBCXX_USE_C99
-- Performing Test SUPPORT_GLIBCXX_USE_C99 - Success
-- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED
-- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED - Success
-- std::exception_ptr is supported.
-- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING
-- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed
-- Turning off deprecation warning due to glog.
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS - Success
-- Current compiler supports avx2 extension. Will build perfkernels.
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Success
-- Current compiler supports avx512f extension. Will build fbgemm.
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Success
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
-- 
-- 3.11.4.0
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Performing Test protobuf_HAVE_BUILTIN_ATOMICS
-- Performing Test protobuf_HAVE_BUILTIN_ATOMICS - Success
-- Caffe2 protobuf include directory: $<BUILD_INTERFACE:/home/username/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>
-- Trying to find preferred BLAS backend of choice: MKL
-- MKL_THREADING = OMP
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- Looking for cblas_sgemm
-- Looking for cblas_sgemm - found
-- MKL libraries: /usr/lib/x86_64-linux-gnu/libmkl_intel_lp64.so;/usr/lib/x86_64-linux-gnu/libmkl_gnu_thread.so;/usr/lib/x86_64-linux-gnu/libmkl_core.so;-fopenmp;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libdl.so
-- MKL include directory: /home/username/miniconda3/pkgs/mkl-include-2021.2.0-h06a4308_296/include
-- MKL OpenMP type: GNU
-- MKL OpenMP library: -fopenmp
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Brace yourself, we are building NNPACK
-- Performing Test NNPACK_ARCH_IS_X86_32
-- Performing Test NNPACK_ARCH_IS_X86_32 - Failed
-- Found PythonInterp: /home/username/miniconda3/envs/torchdev/bin/python (found version "3.7.9") 
-- NNPACK backend is x86-64
-- Failed to find LLVM FileCheck
-- Found Git: /usr/bin/git (found version "2.25.1") 
-- Performing Test HAVE_CXX_FLAG_STD_CXX11
-- Performing Test HAVE_CXX_FLAG_STD_CXX11 - Success
-- Performing Test HAVE_CXX_FLAG_WALL
-- Performing Test HAVE_CXX_FLAG_WALL - Success
-- Performing Test HAVE_CXX_FLAG_WEXTRA
-- Performing Test HAVE_CXX_FLAG_WEXTRA - Success
-- Performing Test HAVE_CXX_FLAG_WSHADOW
-- Performing Test HAVE_CXX_FLAG_WSHADOW - Success
-- Performing Test HAVE_CXX_FLAG_WERROR
-- Performing Test HAVE_CXX_FLAG_WERROR - Success
-- Performing Test HAVE_CXX_FLAG_PEDANTIC
-- Performing Test HAVE_CXX_FLAG_PEDANTIC - Success
-- Performing Test HAVE_CXX_FLAG_PEDANTIC_ERRORS
-- Performing Test HAVE_CXX_FLAG_PEDANTIC_ERRORS - Success
-- Performing Test HAVE_CXX_FLAG_WSHORTEN_64_TO_32
-- Performing Test HAVE_CXX_FLAG_WSHORTEN_64_TO_32 - Failed
-- Performing Test HAVE_CXX_FLAG_WFLOAT_EQUAL
-- Performing Test HAVE_CXX_FLAG_WFLOAT_EQUAL - Success
-- Performing Test HAVE_CXX_FLAG_FSTRICT_ALIASING
-- Performing Test HAVE_CXX_FLAG_FSTRICT_ALIASING - Success
-- Performing Test HAVE_CXX_FLAG_WNO_DEPRECATED_DECLARATIONS
-- Performing Test HAVE_CXX_FLAG_WNO_DEPRECATED_DECLARATIONS - Success
-- Performing Test HAVE_CXX_FLAG_WSTRICT_ALIASING
-- Performing Test HAVE_CXX_FLAG_WSTRICT_ALIASING - Success
-- Performing Test HAVE_CXX_FLAG_WD654
-- Performing Test HAVE_CXX_FLAG_WD654 - Failed
-- Performing Test HAVE_CXX_FLAG_WTHREAD_SAFETY
-- Performing Test HAVE_CXX_FLAG_WTHREAD_SAFETY - Failed
-- Performing Test HAVE_CXX_FLAG_COVERAGE
-- Performing Test HAVE_CXX_FLAG_COVERAGE - Success
-- Performing Test COMPILER_SUPPORTS_AVX512
-- Performing Test COMPILER_SUPPORTS_AVX512 - Success
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Performing Test __CxxFlag__fno_threadsafe_statics
-- Performing Test __CxxFlag__fno_threadsafe_statics - Success
-- Performing Test __CxxFlag__fno_semantic_interposition
-- Performing Test __CxxFlag__fno_semantic_interposition - Success
-- Performing Test __CxxFlag__fmerge_all_constants
-- Performing Test __CxxFlag__fmerge_all_constants - Success
-- Performing Test __CxxFlag__fno_enforce_eh_specs
-- Performing Test __CxxFlag__fno_enforce_eh_specs - Success
-- Found Numa: /usr/include  
-- Found Numa  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnuma.so)
-- Using third party subdirectory Eigen.
-- Found PythonInterp: /home/username/miniconda3/envs/torchdev/bin/python (found suitable version "3.7.9", minimum required is "3.0") 
-- Found PythonLibs: /home/username/miniconda3/envs/torchdev/lib/libpython3.7m.so.1.0 (found suitable version "3.7.9", minimum required is "3.0") 
-- Could NOT find pybind11 (missing: pybind11_DIR)
-- Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR) 
-- Using third_party/pybind11.
-- pybind11 include dirs: /home/username/pytorch/cmake/../third_party/pybind11/include
-- Found MPI_C: /usr/lib/x86_64-linux-gnu/libmpich.so (found version "3.1") 
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/libmpichcxx.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- MPI support found
-- MPI compile flags: 
-- MPI include path: /usr/include/x86_64-linux-gnu/mpich
-- MPI LINK flags path: -Wl,-Bsymbolic-functions
-- MPI libraries: /usr/lib/x86_64-linux-gnu/libmpichcxx.so/usr/lib/x86_64-linux-gnu/libmpich.so
-- Adding OpenMP CXX_FLAGS: -fopenmp
-- Will link against OpenMP libraries: /usr/lib/gcc/x86_64-linux-gnu/9/libgomp.so;/usr/lib/x86_64-linux-gnu/libpthread.so
-- Found CUDA: /usr/local/cuda-11.2 (found version "11.2") 
-- Caffe2: CUDA detected: 11.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda-11.2/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda-11.2
-- Caffe2: Header version is: 11.2
-- Found CUDNN: /home/username/miniconda3/pkgs/cudnn-7.6.5-cuda10.2_0/lib/libcudnn.so  
-- Found cuDNN: v7.6.5  (include: /home/username/miniconda3/pkgs/cudnn-7.6.5-cuda10.2_0/include, library: /home/username/miniconda3/pkgs/cudnn-7.6.5-cuda10.2_0/lib/libcudnn.so)
-- /usr/local/cuda-11.2/lib64/libnvrtc.so shorthash is 369df368
-- Autodetected CUDA architecture(s):  7.5 7.5
-- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
-- Autodetected CUDA architecture(s):  7.5 7.5
-- Found CUB: /usr/local/cuda-11.2/include  
-- Gloo build as SHARED library
-- MPI include path: /usr/include/x86_64-linux-gnu/mpich
-- MPI libraries: /usr/lib/x86_64-linux-gnu/libmpichcxx.so/usr/lib/x86_64-linux-gnu/libmpich.so
-- Found CUDA: /usr/local/cuda-11.2 (found suitable version "11.2", minimum required is "7.0") 
-- CUDA detected: 11.2
-- Found NCCL: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/include  
-- Determining NCCL version from the header file: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/include/nccl.h
-- Found NCCL (include: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/include, library: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so)
-- Found CUDA: /usr/local/cuda-11.2 (found version "11.2") 
-- Performing Test UV_LINT_W4
-- Performing Test UV_LINT_W4 - Failed
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER_MSVC
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER_MSVC - Failed
-- Performing Test UV_LINT_NO_CONDITIONAL_CONSTANT_MSVC
-- Performing Test UV_LINT_NO_CONDITIONAL_CONSTANT_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_MSVC
-- Performing Test UV_LINT_NO_NONSTANDARD_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_EMPTY_TU_MSVC
-- Performing Test UV_LINT_NO_NONSTANDARD_EMPTY_TU_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_FILE_SCOPE_MSVC
-- Performing Test UV_LINT_NO_NONSTANDARD_FILE_SCOPE_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_NONSTATIC_DLIMPORT_MSVC
-- Performing Test UV_LINT_NO_NONSTANDARD_NONSTATIC_DLIMPORT_MSVC - Failed
-- Performing Test UV_LINT_NO_HIDES_LOCAL
-- Performing Test UV_LINT_NO_HIDES_LOCAL - Failed
-- Performing Test UV_LINT_NO_HIDES_PARAM
-- Performing Test UV_LINT_NO_HIDES_PARAM - Failed
-- Performing Test UV_LINT_NO_HIDES_GLOBAL
-- Performing Test UV_LINT_NO_HIDES_GLOBAL - Failed
-- Performing Test UV_LINT_NO_CONDITIONAL_ASSIGNMENT_MSVC
-- Performing Test UV_LINT_NO_CONDITIONAL_ASSIGNMENT_MSVC - Failed
-- Performing Test UV_LINT_NO_UNSAFE_MSVC
-- Performing Test UV_LINT_NO_UNSAFE_MSVC - Failed
-- Performing Test UV_LINT_WALL
-- Performing Test UV_LINT_WALL - Success
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER - Success
-- Performing Test UV_LINT_STRICT_PROTOTYPES
-- Performing Test UV_LINT_STRICT_PROTOTYPES - Success
-- Performing Test UV_LINT_EXTRA
-- Performing Test UV_LINT_EXTRA - Success
-- Performing Test UV_LINT_UTF8_MSVC
-- Performing Test UV_LINT_UTF8_MSVC - Failed
-- Performing Test UV_F_STRICT_ALIASING
-- Performing Test UV_F_STRICT_ALIASING - Success
-- summary of build options:
    Install prefix:  /home/username/pytorch/torch
    Target system:   Linux
    Compiler:
      C compiler:    /usr/bin/cc
      CFLAGS:          -fopenmp

-- Found uv: 1.38.1 (found version "1.38.1") 
-- 
-- ******** Summary ********
--   CMake version         : 3.19.6
--   CMake command         : /home/username/miniconda3/envs/torchdev/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler version  : 9.3.0
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -Wnon-virtual-dtor
--   Build type            : Release
--   Compile definitions   : TH_BLAS_MKL;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1
--   CMAKE_PREFIX_PATH     : /home/username/miniconda3/envs/torchdev/lib/python3.7/site-packages;/usr/local/cuda-11.2
--   CMAKE_INSTALL_PREFIX  : /home/username/pytorch/torch
--   CMAKE_MODULE_PATH     : /home/username/pytorch/cmake/Modules;/home/username/pytorch/cmake/public/../Modules_CUDA_fix
-- 
--   ONNX version          : 1.8.0
--   ONNX NAMESPACE        : onnx_torch
--   ONNX_BUILD_TESTS      : OFF
--   ONNX_BUILD_BENCHMARKS : OFF
--   ONNX_USE_LITE_PROTO   : OFF
--   ONNXIFI_DUMMY_BACKEND : OFF
--   ONNXIFI_ENABLE_EXT    : OFF
-- 
--   Protobuf compiler     : 
--   Protobuf includes     : 
--   Protobuf libraries    : 
--   BUILD_ONNX_PYTHON     : OFF
-- 


cont.

-- ******** Summary ********
--   CMake version         : 3.19.6
--   CMake command         : /home/username/miniconda3/envs/torchdev/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler version  : 9.3.0
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -Wnon-virtual-dtor
--   Build type            : Release
--   Compile definitions   : TH_BLAS_MKL;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1
--   CMAKE_PREFIX_PATH     : /home/username/miniconda3/envs/torchdev/lib/python3.7/site-packages;/usr/local/cuda-11.2
--   CMAKE_INSTALL_PREFIX  : /home/username/pytorch/torch
--   CMAKE_MODULE_PATH     : /home/username/pytorch/cmake/Modules;/home/username/pytorch/cmake/public/../Modules_CUDA_fix
-- 
--   ONNX version          : 1.4.1
--   ONNX NAMESPACE        : onnx_torch
--   ONNX_BUILD_TESTS      : OFF
--   ONNX_BUILD_BENCHMARKS : OFF
--   ONNX_USE_LITE_PROTO   : OFF
--   ONNXIFI_DUMMY_BACKEND : OFF
-- 
--   Protobuf compiler     : 
--   Protobuf includes     : 
--   Protobuf libraries    : 
--   BUILD_ONNX_PYTHON     : OFF
-- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor
-- Adding -DNDEBUG to compile flags
-- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2
-- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - True
-- Compiling with MAGMA support
-- MAGMA INCLUDE DIRECTORIES: /home/username/miniconda3/pkgs/magma-2.5.0-hc5c8b49_0/include
-- MAGMA LIBRARIES: /home/username/miniconda3/pkgs/magma-2.5.0-hc5c8b49_0/lib/libmagma.a
-- MAGMA V2 check: 1
-- Could not find hardware support for NEON on this machine.
-- No OMAP3 processor on this machine.
-- No OMAP4 processor on this machine.
-- Looking for cpuid.h
-- Looking for cpuid.h - found
-- Performing Test HAVE_GCC_GET_CPUID
-- Performing Test HAVE_GCC_GET_CPUID - Success
-- Performing Test NO_GCC_EBX_FPIC_BUG
-- Performing Test NO_GCC_EBX_FPIC_BUG - Success
-- Performing Test C_VSX_FOUND
-- Performing Test C_VSX_FOUND - Failed
-- Performing Test CXX_VSX_FOUND
-- Performing Test CXX_VSX_FOUND - Failed
-- Performing Test C_HAS_AVX_1
-- Performing Test C_HAS_AVX_1 - Failed
-- Performing Test C_HAS_AVX_2
-- Performing Test C_HAS_AVX_2 - Success
-- Performing Test C_HAS_AVX2_1
-- Performing Test C_HAS_AVX2_1 - Failed
-- Performing Test C_HAS_AVX2_2
-- Performing Test C_HAS_AVX2_2 - Success
-- Performing Test CXX_HAS_AVX_1
-- Performing Test CXX_HAS_AVX_1 - Failed
-- Performing Test CXX_HAS_AVX_2
-- Performing Test CXX_HAS_AVX_2 - Success
-- Performing Test CXX_HAS_AVX2_1
-- Performing Test CXX_HAS_AVX2_1 - Failed
-- Performing Test CXX_HAS_AVX2_2
-- Performing Test CXX_HAS_AVX2_2 - Success
-- AVX compiler support found
-- AVX2 compiler support found
-- Performing Test BLAS_F2C_DOUBLE_WORKS
-- Performing Test BLAS_F2C_DOUBLE_WORKS - Failed
-- Performing Test BLAS_F2C_FLOAT_WORKS
-- Performing Test BLAS_F2C_FLOAT_WORKS - Success
-- Performing Test BLAS_USE_CBLAS_DOT
-- Performing Test BLAS_USE_CBLAS_DOT - Success
-- Found a library with BLAS API (mkl). Full path: (/usr/lib/x86_64-linux-gnu/libmkl_intel_lp64.so;/usr/lib/x86_64-linux-gnu/libmkl_gnu_thread.so;/usr/lib/x86_64-linux-gnu/libmkl_core.so;-fopenmp;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libdl.so)
-- Found a library with LAPACK API (mkl).
-- MIOpen not found. Compiling without MIOpen support
-- MKLDNN_CPU_RUNTIME = OMP
-- Intel MKL-DNN compat: set DNNL_ENABLE_CONCURRENT_EXEC to MKLDNN_ENABLE_CONCURRENT_EXEC with value `ON`
-- Intel MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value `FALSE`
-- Intel MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value `FALSE`
-- Intel MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value `STATIC`
-- Intel MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value `-msse4`
-- Intel MKL-DNN compat: set DNNL_CPU_RUNTIME to MKLDNN_CPU_RUNTIME with value `OMP`
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Primitive cache is enabled
-- Found MKL-DNN: TRUE
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for mmap
-- Looking for mmap - found
-- Looking for shm_open
-- Looking for shm_open - found
-- Looking for shm_unlink
-- Looking for shm_unlink - found
-- Looking for malloc_usable_size
-- Looking for malloc_usable_size - found
-- Performing Test C_HAS_THREAD
-- Performing Test C_HAS_THREAD - Success
-- Version: 7.0.3
-- Build type: Release
-- CXX_STANDARD: 14
-- Performing Test has_std_14_flag
-- Performing Test has_std_14_flag - Success
-- Performing Test has_std_1y_flag
-- Performing Test has_std_1y_flag - Success
-- Performing Test SUPPORTS_USER_DEFINED_LITERALS
-- Performing Test SUPPORTS_USER_DEFINED_LITERALS - Success
-- Performing Test FMT_HAS_VARIANT
-- Performing Test FMT_HAS_VARIANT - Success
-- Required features: cxx_variadic_templates
-- Looking for strtod_l
-- Looking for strtod_l - not found
-- CUDA build detected, configuring Kineto with CUPTI support.
-- Configuring Kineto dependency:
--   KINETO_SOURCE_DIR = /home/username/pytorch/third_party/kineto/libkineto
--   KINETO_BUILD_TESTS = OFF
--   KINETO_LIBRARY_TYPE = static
--   CUDA_SOURCE_DIR = /usr/local/cuda-11.2
--   CUDA_cupti_LIBRARY = /usr/local/cuda-11.2/extras/CUPTI/lib64/libcupti_static.a
--   CUPTI_INCLUDE_DIR = /usr/local/cuda-11.2/extras/CUPTI/include
-- Found PythonInterp: /home/username/miniconda3/envs/torchdev/bin/python (found version "3.7.9") 
-- Kineto: FMT_SOURCE_DIR = /home/username/pytorch/third_party/fmt
-- Kineto: FMT_INCLUDE_DIR = /home/username/pytorch/third_party/fmt/include
-- Configured Kineto
-- GCC 9.3.0: Adding gcc and gcc_s libs to link line
-- Performing Test HAS_WERROR_FORMAT
-- Performing Test HAS_WERROR_FORMAT - Success
-- Performing Test HAS_WERROR_CAST_FUNCTION_TYPE
-- Performing Test HAS_WERROR_CAST_FUNCTION_TYPE - Success
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include  
-- NUMA paths:
-- /usr/include
-- /usr/lib/x86_64-linux-gnu/libnuma.so
-- Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT
-- Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT - Success
-- Using ATen parallel backend: OMP
-- Found OpenSSL: /usr/lib/x86_64-linux-gnu/libcrypto.so (found version "1.1.1f")  
-- Check size of long double
-- Check size of long double - done
-- Performing Test COMPILER_SUPPORTS_LONG_DOUBLE
-- Performing Test COMPILER_SUPPORTS_LONG_DOUBLE - Success
-- Performing Test COMPILER_SUPPORTS_FLOAT128
-- Performing Test COMPILER_SUPPORTS_FLOAT128 - Success
-- Performing Test COMPILER_SUPPORTS_SSE2
-- Performing Test COMPILER_SUPPORTS_SSE2 - Success
-- Performing Test COMPILER_SUPPORTS_SSE4
-- Performing Test COMPILER_SUPPORTS_SSE4 - Success
-- Performing Test COMPILER_SUPPORTS_AVX
-- Performing Test COMPILER_SUPPORTS_AVX - Success
-- Performing Test COMPILER_SUPPORTS_FMA4
-- Performing Test COMPILER_SUPPORTS_FMA4 - Success
-- Performing Test COMPILER_SUPPORTS_AVX2
-- Performing Test COMPILER_SUPPORTS_AVX2 - Success
-- Performing Test COMPILER_SUPPORTS_AVX512F
-- Performing Test COMPILER_SUPPORTS_AVX512F - Success
-- Performing Test COMPILER_SUPPORTS_OPENMP
-- Performing Test COMPILER_SUPPORTS_OPENMP - Success
-- Performing Test COMPILER_SUPPORTS_WEAK_ALIASES
-- Performing Test COMPILER_SUPPORTS_WEAK_ALIASES - Success
-- Performing Test COMPILER_SUPPORTS_BUILTIN_MATH
-- Performing Test COMPILER_SUPPORTS_BUILTIN_MATH - Success
-- Performing Test COMPILER_SUPPORTS_SYS_GETRANDOM
-- Performing Test COMPILER_SUPPORTS_SYS_GETRANDOM - Success
-- Configuring build for SLEEF-v3.6.0
-- Using option `-Wall -Wno-unused -Wno-attributes -Wno-unused-result -Wno-psabi -ffp-contract=off -fno-math-errno -fno-trapping-math` to compile libsleef
-- Building shared libs : OFF
-- Building static test bins: OFF
-- MPFR : LIB_MPFR-NOTFOUND
-- GMP : LIBGMP-NOTFOUND
-- RT : /usr/lib/x86_64-linux-gnu/librt.so
-- FFTW3 : LIBFFTW3-NOTFOUND
-- OPENSSL : 1.1.1f
-- SDE : SDE_COMMAND-NOTFOUND
-- RUNNING_ON_TRAVIS : 
-- COMPILER_SUPPORTS_OPENMP : 1
-- Include NCCL operators
-- Excluding FakeLowP operators
-- Including IDEEP operators
-- Excluding image processing operators due to no opencv
-- Excluding video processing operators due to no opencv
-- Include Observer library
-- breakpad library not found
-- /usr/bin/c++ /home/username/pytorch/torch/abi-check.cpp -o /home/username/pytorch/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
-- MPI_INCLUDE_PATH: /usr/include/x86_64-linux-gnu/mpich
-- MPI_LIBRARIES: /usr/lib/x86_64-linux-gnu/libmpichcxx.so;/usr/lib/x86_64-linux-gnu/libmpich.so
-- MPIEXEC: /usr/bin/mpiexec
-- Autodetected CUDA architecture(s):  7.5 7.5
-- pytorch is compiling with OpenMP. 
OpenMP CXX_FLAGS: -fopenmp. 
OpenMP libraries: /usr/lib/gcc/x86_64-linux-gnu/9/libgomp.so;/usr/lib/x86_64-linux-gnu/libpthread.so.
-- Caffe2 is compiling with OpenMP. 
OpenMP CXX_FLAGS: -fopenmp. 
OpenMP libraries: /usr/lib/gcc/x86_64-linux-gnu/9/libgomp.so;/usr/lib/x86_64-linux-gnu/libpthread.so.
-- Using lib/python3.7/site-packages as python relative installation path
-- 
-- ******** Summary ********
-- General:
--   CMake version         : 3.19.6
--   CMake command         : /home/username/miniconda3/envs/torchdev/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 9.3.0
--   Using ccache if found : ON
--   Found ccache          : /home/username/miniconda3/bin/ccache
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : TH_BLAS_MKL;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;MAGMA_V2;IDEEP_USE_MKL;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
--   CMAKE_PREFIX_PATH     : /home/username/miniconda3/envs/torchdev/lib/python3.7/site-packages;/usr/local/cuda-11.2
--   CMAKE_INSTALL_PREFIX  : /home/username/pytorch/torch
--   USE_GOLD_LINKER       : OFF
-- 
--   TORCH_VERSION         : 1.9.0
--   CAFFE2_VERSION        : 1.9.0
--   BUILD_CAFFE2          : ON
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_CAFFE2_MOBILE   : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : ON
--     Link local protobuf : ON
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : True
--     Python version      : 3.7.9
--     Python executable   : /home/username/miniconda3/envs/torchdev/bin/python
--     Pythonlibs version  : 3.7.9
--     Python library      : /home/username/miniconda3/envs/torchdev/lib/libpython3.7m.so.1.0
--     Python includes     : /home/username/miniconda3/envs/torchdev/include/python3.7m
--     Python site-packages: lib/python3.7/site-packages
--   BUILD_SHARED_LIBS     : ON
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : OFF
--   BUILD_TEST            : True
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   BUILD_LITE_INTERPRETER: OFF
--   INTERN_BUILD_MOBILE   : 
--   USE_BLAS              : 1
--     BLAS                : mkl
--   USE_LAPACK            : 1
--     LAPACK              : mkl
--   USE_ASAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : ON
--     Split CUDA          : OFF
--     CUDA static link    : OFF
--     USE_CUDNN           : ON
--     CUDA version        : 11.2
--     cuDNN version       : 7.6.5
--     CUDA root directory : /usr/local/cuda-11.2
--     CUDA library        : /usr/local/cuda-11.2/lib64/stubs/libcuda.so
--     cudart library      : /usr/local/cuda-11.2/lib64/libcudart.so
--     cublas library      : /usr/local/cuda-11.2/lib64/libcublas.so
--     cufft library       : /usr/local/cuda-11.2/lib64/libcufft.so
--     curand library      : /usr/local/cuda-11.2/lib64/libcurand.so
--     cuDNN library       : /home/username/miniconda3/pkgs/cudnn-7.6.5-cuda10.2_0/lib/libcudnn.so
--     nvrtc               : /usr/local/cuda-11.2/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda-11.2/include
--     NVCC executable     : /usr/local/cuda-11.2/bin/nvcc
--     NVCC flags          : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_75,code=sm_75;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_BFLOAT16_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
--     CUDA host compiler  : /usr/bin/cc
--     NVCC --device-c     : OFF
--     USE_TENSORRT        : OFF
--   USE_ROCM              : OFF
--   USE_EIGEN_FOR_BLAS    : 
--   USE_FBGEMM            : ON
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : ON
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_FFTW              : OFF
--   USE_MKL               : ON
--   USE_MKLDNN            : ON
--   USE_MKLDNN_CBLAS      : OFF
--   USE_NCCL              : ON
--     USE_SYSTEM_NCCL     : OFF
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : ON
--   USE_PYTORCH_QNNPACK   : ON
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI             : ON
--     USE_GLOO            : ON
--     USE_TENSORPIPE      : ON
--   USE_DEPLOY           : OFF
--   Public Dependencies  : Threads::Threads;caffe2::mkl;caffe2::mkldnn
--   Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fbgemm;/usr/lib/x86_64-linux-gnu/libnuma.so;fp16;/usr/lib/x86_64-linux-gnu/libmpichcxx.so;/usr/lib/x86_64-linux-gnu/libmpich.so;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
-- Configuring done
-- Generating done
-- Build files have been written to: /home/username/pytorch/build

and here is where the error occurs:

[5818/6121] Linking CXX shared library lib/libtorch.so
[5819/6121] Linking CXX executable bin/scalar_tensor_test
FAILED: bin/scalar_tensor_test 
: && /usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic -Wl,-Bsymbolic-functions caffe2/CMakeFiles/scalar_tensor_test.dir/__/aten/src/ATen/test/scalar_tensor_test.cpp.o -o bin/scalar_tensor_test  -Wl,-rpath,/home/username/pytorch/build/lib:/usr/local/cuda-11.2/lib64:/home/username/miniconda3/pkgs/cudnn-7.6.5-cuda10.2_0/lib:  lib/libgtest_main.a  -Wl,--no-as-needed,"/home/username/pytorch/build/lib/libtorch.so" -Wl,--as-needed  -Wl,--no-as-needed,"/home/username/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  lib/libprotobuf.a  -lmkl_intel_lp64  -lmkl_gnu_thread  -lmkl_core  -fopenmp  /usr/lib/x86_64-linux-gnu/libpthread.so  -lm  /usr/lib/x86_64-linux-gnu/libdl.so  lib/libdnnl.a  -ldl  -Wl,--no-as-needed,"/home/username/pytorch/build/lib/libtorch_cuda.so" -Wl,--as-needed  lib/libc10_cuda.so  lib/libc10.so  /usr/local/cuda-11.2/lib64/libcudart.so  /usr/lib/x86_64-linux-gnu/libnvToolsExt.so  /usr/local/cuda-11.2/lib64/libcufft.so  /usr/local/cuda-11.2/lib64/libcurand.so  /usr/local/cuda-11.2/lib64/libcublas.so  /home/username/miniconda3/pkgs/cudnn-7.6.5-cuda10.2_0/lib/libcudnn.so  lib/libgtest.a  -pthread && :
/usr/bin/ld: warning: libcudart.so.10.0, needed by /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaHostUnregister@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaDeviceCanAccessPeer@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaIpcGetMemHandle@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `__cudaPopCallConfiguration@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaPointerGetAttributes@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaLaunchKernel@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `__cudaRegisterFatBinary@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaDeviceGetPCIBusId@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaIpcCloseMemHandle@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaDeviceEnablePeerAccess@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaEventDestroy@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaMalloc@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaHostRegister@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaMallocHost@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaGetErrorString@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `__cudaRegisterFunction@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `__cudaUnregisterFatBinary@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaEventCreateWithFlags@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaMemset@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaIpcOpenMemHandle@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaFree@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaFreeHost@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaStreamWaitEvent@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaGetLastError@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaEventRecord@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaGetDevice@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaMemcpy@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaSetDevice@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaMemcpyAsync@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `cudaHostGetDevicePointer@libcudart.so.10.0'
/usr/bin/ld: /home/username/miniconda3/pkgs/nccl-1.3.5-cuda10.0_0/lib/libnccl.so.1: undefined reference to `__cudaRegisterVar@libcudart.so.10.0'
collect2: error: ld returned 1 exit status
[5820/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/fx/fx_init.cpp.o
[5821/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/tensor/python_tensor.cpp.o
[5822/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_variable_indexing.cpp.o
[5823/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_arg_flatten.cpp.o
[5824/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx.cpp.o
[5825/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/runtime/static/init.cpp.o
[5826/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_interpreter.cpp.o
[5827/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/backends/backend_init.cpp.o
[5828/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/pybind_utils.cpp.o
[5829/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/shape_type_inference.cpp.o
[5830/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_custom_class.cpp.o
[5831/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/frontend/concrete_module_type.cpp.o
[5832/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_tracer.cpp.o
[5833/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_sugared_value.cpp.o
[5834/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_tree_views.cpp.o
[5835/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/tensorexpr/tensorexpr_init.cpp.o
[5836/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_ir.cpp.o
[5837/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/init.cpp.o
[5838/6121] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/script_init.cpp.o
ninja: build stopped: subcommand failed.

Mixing libnccl compiled with CUDA 10 doesn’t work with CUDA 11.2 you use for building PyTorch. I would probably use the vendored nccl.

Ok. I did a clean build now with CUDA 11.3, cuDNN 8.2.0 and /lib/x86_64-linux-gnu/libnccl.so.

The build succeeds, and I have uploaded the output of python setup.py develop here since it’s too large for one or two forum posts. But running test/run_test.sh I get fails immediately:

Running test_import_time ... [2021-05-11 20:29:24.795319]
Executing ['/home//miniconda3/envs/torchdev/bin/python', 'test_import_time.py'] ... [2021-05-11 20:29:24.795436]
..
----------------------------------------------------------------------
Ran 2 tests in 1.114s

OK
Running test_public_bindings ... [2021-05-11 20:29:26.483024]
Executing ['/home//miniconda3/envs/torchdev/bin/python', 'test_public_bindings.py'] ... [2021-05-11 20:29:26.483127]
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK
Running test_type_hints ... [2021-05-11 20:29:27.053734]
Executing ['/home//miniconda3/envs/torchdev/bin/python', 'test_type_hints.py'] ... [2021-05-11 20:29:27.053839]
s
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK (skipped=1)
Running test_autograd ... [2021-05-11 20:29:27.612298]
Executing ['/home//miniconda3/envs/torchdev/bin/python', 'test_autograd.py'] ... [2021-05-11 20:29:27.612348]
*** stack smashing detected ***: terminated
Traceback (most recent call last):
  File "test/run_test.py", line 1169, in <module>
    main()
  File "test/run_test.py", line 1148, in main
    raise RuntimeError(err_message)
RuntimeError: test_autograd failed! Received signal: SIGIOT

The next step could be to run under gdb (you can also run test_autograd directly if you want) and get a stack trace. I’m a bit surprised of sigiot, though, it could be that you still have a library version mixup somewhere (try using ldd on the libraries in build/lib.*/torch/lib or so). Sometimes things like threading libraries are tricky.

II did gdb --args python test_autograd.py, followed by run with output

[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[Detaching after fork from child process 153339]
[New Thread 0x7fff817a9700 (LWP 153353)]
*** stack smashing detected ***: terminated

Thread 1 “python” received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at …/sysdeps/unix/sysv/linux/raise.c:50
50 …/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

and backtrace with output

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50                       [110/610]
#1  0x00007ffff7dc1859 in __GI_abort () at abort.c:79
#2  0x00007ffff7e2c3ee in __libc_message (action=action@entry=do_abort,
    fmt=fmt@entry=0x7ffff7f5607c "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff7eceb4a in __GI___fortify_fail (msg=msg@entry=0x7ffff7f56064 "stack smashing detected")
    at fortify_fail.c:26
#4  0x00007ffff7eceb16 in __stack_chk_fail () at stack_chk_fail.c:24
#5  0x00007fffb578f0f7 in magma_init () from /home/user/pytorch/torch/lib/libtorch_cuda.so
#6  0x00007fffb5060851 in at::cuda::detail::CUDAHooks::initCUDA() const ()
   from /home/user/pytorch/torch/lib/libtorch_cuda.so
#7  0x00007fffcafc1e10 in std::call_once<at::Context::lazyInitCUDA()::{lambda()#1}>(std::once_flag&, at::
Context::lazyInitCUDA()::{lambda()#1}&&)::{lambda()#2}::_FUN() ()
   from /home/user/pytorch/torch/lib/libtorch_python.so
#8  0x00007ffff7fa047f in __pthread_once_slow (
    once_control=0x7fffc9c8fe40 <at::globalContext()::globalContext_>,
    init_routine=0x7fffd75987ba <std::__once_proxy()>) at pthread_once.c:116
#9  0x00007fffcafc1774 in THCPModule_initExtension(_object*, _object*) ()
   from /home/user/pytorch/torch/lib/libtorch_python.so
#10 0x00005555556b97e1 in _PyMethodDef_RawFastCallKeywords (method=0x555557ca3f40, self=0x7ffff6cf44d0,
    args=0x5555587d9c00, nargs=<optimized out>, kwnames=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:632
#11 0x00005555556b9a31 in _PyCFunction_FastCallKeywords (func=0x7ffff6cf7dc0, args=<optimized out>,
    nargs=<optimized out>, kwnames=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:732
#12 0x0000555555725ebd in call_function (kwnames=0x0, oparg=0, pp_stack=<synthetic pointer>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:4568
#13 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3093
#14 0x000055555566985b in function_code_fastcall (globals=<optimized out>, nargs=0,
    args=<optimized out>, co=0x7fff8fd3aa50)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:283
#15 _PyFunction_FastCallDict (func=<optimized out>, args=0x0, nargs=0, kwargs=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:322
#16 0x00005555555c9ad0 in _PyObject_CallFunctionVa (callable=0x7fff8fc287a0, format=<optimized out>,
    va=<optimized out>, is_size_t=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:933
#17 0x00005555556c3287 in callmethod (is_size_t=0, va=0x7fffffffc710, format=0x7fffcb46aa4f "",
    callable=0x7fff8fc287a0) at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:1029
#18 PyObject_CallMethod (obj=<optimized out>, name=<optimized out>, format=0x7fffcb46aa4f "")
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:1048
#19 0x00007fffcaf80391 in torch::utils::cuda_lazy_init() ()
   from /home/user/pytorch/torch/lib/libtorch_python.so
#20 0x00007fffcafa8fac in torch::utils::(anonymous namespace)::internal_new_from_data(c10::TensorOptions,
 c10::ScalarType, c10::optional<c10::Device>, _object*, bool, bool, bool, bool) ()
   from /home/user/pytorch/torch/lib/libtorch_python.so
#21 0x00007fffcafadc99 in torch::utils::tensor_ctor(c10::DispatchKey, c10::ScalarType, _object*, _object*
) () from /home/user/pytorch/torch/lib/libtorch_python.so
#22 0x00007fffcabe80ab in torch::autograd::THPVariable_tensor(_object*, _object*, _object*) ()
   from /home/user/pytorch/torch/lib/libtorch_python.so
#23 0x00005555556b99b6 in _PyMethodDef_RawFastCallKeywords (method=<optimized out>, self=0x0,
    args=0x5555586b4040, nargs=<optimized out>, kwnames=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:693
#24 0x00005555556b9a31 in _PyCFunction_FastCallKeywords (func=0x7ffff6d0d780, args=<optimized out>,
    nargs=<optimized out>, kwnames=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:732
#25 0x0000555555726483 in call_function (kwnames=0x7ffff6ec2f90, oparg=<optimized out>,
    pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:4568
#26 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3139
#27 0x0000555555668829 in _PyEval_EvalCodeWithName (_co=0x7fff89815c90, globals=<optimized out>,
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0,
    kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3930
#28 0x0000555555669714 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>,
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>,
    kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3959
#29 0x000055555566973c in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>,          [42/610]
    locals=<optimized out>) at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:524
#30 0x0000555555730e11 in builtin_exec_impl.isra.12 (locals=0x7fff898082d0, globals=0x7fff898082d0,
    source=0x7fff89815c90) at /tmp/build/80754af9/python_1598874792229/work/Python/bltinmodule.c:1079
#31 builtin_exec (module=<optimized out>, args=<optimized out>, nargs=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/clinic/bltinmodule.c.h:283
#32 0x000055555568a4b2 in _PyMethodDef_RawFastCallDict (method=0x5555558812e0 <builtin_methods+480>,
    self=0x7ffff7617d10, args=<optimized out>, nargs=2, kwargs=0x7fff89808a00)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:530
#33 0x000055555568a5d1 in _PyCFunction_FastCallDict (func=0x7ffff761ee10, args=<optimized out>,
    nargs=<optimized out>, kwargs=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:585
#34 0x0000555555726c33 in do_call_core (kwdict=0x7fff89808a00, callargs=0x7fff8a000820,
    func=0x7ffff761ee10) at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:4641
#35 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3191
#36 0x0000555555668829 in _PyEval_EvalCodeWithName (_co=0x7ffff75bf150, globals=<optimized out>,
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0,
    kwargs=0x7fff8995db08, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0,
    name=0x7ffff75bd300, qualname=0x7ffff75bd300)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3930
#37 0x00005555556b9107 in _PyFunction_FastCallKeywords (func=<optimized out>, stack=0x7fff8995daf0,
    nargs=3, kwnames=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:433
#38 0x0000555555725b29 in call_function (kwnames=0x0, oparg=<optimized out>,
    pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:4616
#39 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3093
#40 0x00005555556b8e7b in function_code_fastcall (globals=<optimized out>, nargs=2,
    args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:283
#41 _PyFunction_FastCallKeywords (func=<optimized out>, stack=0x5555585a6ed8, nargs=2,
    kwnames=<optimized out>) at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:408
#42 0x0000555555721740 in call_function (kwnames=0x0, oparg=<optimized out>,
    pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:4616
#43 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3110
#44 0x00005555556b8e7b in function_code_fastcall (globals=<optimized out>, nargs=1,
    args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:283
#45 _PyFunction_FastCallKeywords (func=<optimized out>, stack=0x55555852ce40, nargs=1,
    kwnames=<optimized out>) at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:408
#46 0x00005555557214b6 in call_function (kwnames=0x0, oparg=<optimized out>,
    pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:4616
#47 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3124
#48 0x00005555556b8e7b in function_code_fastcall (globals=<optimized out>, nargs=2,
    args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:283
#49 _PyFunction_FastCallKeywords (func=<optimized out>, stack=0x7fff898429e8, nargs=2,
    kwnames=<optimized out>) at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:408
#50 0x00005555557214b6 in call_function (kwnames=0x0, oparg=<optimized out>,
    pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:4616
#51 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3124
#52 0x000055555566985b in function_code_fastcall (globals=<optimized out>, nargs=2,
    args=<optimized out>, co=0x7ffff75c5930)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:283
#53 _PyFunction_FastCallDict (func=<optimized out>, args=0x7fffffffd7e0, nargs=2,
    kwargs=<optimized out>) at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:322
#54 0x00005555556887ce in object_vacall (callable=0x7ffff75d1a70, vargs=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:1200
#55 0x00005555556e276d in _PyObject_CallMethodIdObjArgs (obj=<optimized out>, name=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:1250
#56 0x0000555555671fdc in import_find_and_load (abs_name=0x7ffff74b6c90)
    at /tmp/build/80754af9/python_1598874792229/work/Python/import.c:1652
#57 PyImport_ImportModuleLevelObject (name=0x7ffff74b6c90, globals=<optimized out>,
--Type <RET> for more, q to quit, c to continue without paging--
    locals=<optimized out>, fromlist=0x7ffff5df4e90, level=0)
    at /tmp/build/80754af9/python_1598874792229/work/Python/import.c:1764
#58 0x0000555555724479 in import_name (level=0x5555558be2e0 <small_ints+160>, fromlist=0x7ffff5df4e90,
    name=0x7ffff74b6c90, f=0x555555967280)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:4770
#59 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:2600
#60 0x0000555555668829 in _PyEval_EvalCodeWithName (_co=0x7ffff73eb420, globals=<optimized out>,
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0,
    kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3930
#61 0x0000555555669714 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>,
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>,
    kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0)
    at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:3959
#62 0x000055555566973c in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>,
    locals=<optimized out>) at /tmp/build/80754af9/python_1598874792229/work/Python/ceval.c:524
#63 0x0000555555780f14 in run_mod (mod=<optimized out>, filename=<optimized out>,
    globals=0x7ffff7597be0, locals=0x7ffff7597be0, flags=<optimized out>, arena=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Python/pythonrun.c:1035
#64 0x000055555578b331 in PyRun_FileExFlags (fp=0x5555558c4180, filename_str=<optimized out>,
    start=<optimized out>, globals=0x7ffff7597be0, locals=0x7ffff7597be0, closeit=1,
    flags=0x7fffffffdd10) at /tmp/build/80754af9/python_1598874792229/work/Python/pythonrun.c:988
#65 0x000055555578b523 in PyRun_SimpleFileExFlags (fp=0x5555558c4180, filename=<optimized out>,
    closeit=1, flags=0x7fffffffdd10)
    at /tmp/build/80754af9/python_1598874792229/work/Python/pythonrun.c:429
#66 0x000055555578c655 in pymain_run_file (p_cf=0x7fffffffdd10,
    filename=0x5555558c3900 L"test_autograd.py", fp=0x5555558c4180)
    at /tmp/build/80754af9/python_1598874792229/work/Modules/main.c:462
#67 pymain_run_filename (cf=0x7fffffffdd10, pymain=0x7fffffffde20)
    at /tmp/build/80754af9/python_1598874792229/work/Modules/main.c:1652
#68 pymain_run_python (pymain=0x7fffffffde20)
    at /tmp/build/80754af9/python_1598874792229/work/Modules/main.c:2913
#69 pymain_main (pymain=0x7fffffffde20)
    at /tmp/build/80754af9/python_1598874792229/work/Modules/main.c:3460
#70 0x000055555578c77c in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>)
    at /tmp/build/80754af9/python_1598874792229/work/Modules/main.c:3495
#71 0x00007ffff7dc30b3 in __libc_start_main (main=0x555555649c90 <main>, argc=2, argv=0x7fffffffdf88,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdf78)
    at ../csu/libc-start.c:308
#72 0x0000555555730ff0 in _start () at ../sysdeps/x86_64/elf/start.S:103

I don’t know how to interpret this output though.

I also uploaded the outputs of ldd build/lib/* here and of torch/lib/* here. What am I supposed to look for?

Looks like it is related to magma. Apparently it’s not dynamically linked, though.
Either magma using a different cuda version or having incompatible threading (I don’t even know if that is a thing with magma) could be not so good.
You could try to disable magma and see if it helps (but of course, you’ll be missing quite a bit of cuda linear algebra then).

I must admit I’m not sure what’s next. I only ever build my stuff on bare metal Debian systems and it mostly works for me, so I don’t have a lot of expertise debugging funny linking stuff.

I had used MAGMA installed via conda. Is this wrong? Apparently I have to register with http://magma.maths.usyd.edu.au/magma and wait a few days to install it manually. Is this something every PyTorch developer does?

By the way: I’ve spent a couple of days fulltime so far on trying to build PyTorch, and it starts to feel like I’m doing something significantly wrong. I had set up the machine (with Ubuntu), installed Cuda, cuDNN etc., and then followed the instructions in contributing.md. As a potential first-time contributor: Are there any other ways to get started? I’m only doing this because I think I need my own build of PyTorch so I can get the tests in test/run_test.sh to pass before making a pull request. The actual coding to fix the issue was a mere fraction of the time I’ve invested into trying to build PyTorch.
What else can I do? Would it help if I use Docker? Do I have to switch to AWS?

It probably is a decidedly nonstandard setup but I literally use plain Debian/unstable even with the CUDA libraries from the non-free section of their archive (they just don’t have cudnn so I take that from NVidia). The other day profiling with kineto would not work but other than that it usually just works and use cmake, python3-dev python3-typing-extensions python3-numpy-dev python3-yaml and so from Debian. No conda, docker or other funny business. :slight_smile: But again, this isn’t the standard way and maybe someone else would have better advice.

Unless you need Magma for what you are changing, you could just compile without. USE_MAGMA=OFF or so.
You can also just start the tests by running test/test_something.py TestCase.test_me or use pytest with -k to filter.

If it’s a targeted fix and you have a test case for it and are reasonably sure that it works, you could probably just open a PR for it. I wouldn’t debug code with the CI, but I have certainly submitted patches where I had overlooked failing CI cases because I ran tests locally to narrowly. So far no-one bit my head off for it for that. :slight_smile:
git-clang-format and python3 -mflake8 are probably a good idea to get right on the first submission.

Best regards

Thomas