Pytorch cuda 11.2 build from source: RuntimeError: CUDA error: no kernel image is available for execution on the device

Hello,

I am trying to install pytorch with cuda by following the build from source method.
I have cuda 11.2, nvtx11.2, cuDNN 8.1.1.33 nvidia cuda visual studio integration 11.2 and visual studio community ediiton 2019 16.6.30204.135. My GPU is compute 7.5 compatible (RTX 2070)

I am trying to build pytorch from a conda environment and I installed all the pre-requisites mentioned in the guide.
Before running setup.py install --cmake, I tried to set the following env variables:

MAGMA_HOME F:\pytorch-source\pytorch.jenkins\pytorch\win-test-helpers\installation-helpers\magma
LIB F:\pytorch-source\pytorch.jenkins\pytorch\win-test-helpers\installation-helpers\mkl\lib
CMAKE_GENERATOR Ninja
TORCH_CUDA_ARCH_LIST 7.5
CMAKE_INCLUDE_PATH F:\pytorch-source\pytorch.jenkins\pytorch\win-test-helpers\installation-helpers\mkl\include

The build and installation is working and it finishes successfully, however, when I try to actually create a tensor on the gpu, i get the following behavior:

import torch
torch.cuda.is_available()
True
torch.cuda.current_device()
0
torch.cuda.device(0)
<torch.cuda.device object at 0x000002731D947640>
torch.cuda.device_count()
1
torch.cuda.get_device_name(0)
β€˜GeForce RTX 2070’
torch.randn(1, device=β€œcuda”)
Traceback (most recent call last):
File β€œβ€, line 1, in
RuntimeError: CUDA error: no kernel image is available for execution on the device

Can someone please guide me further on troubleshooting this? It seems to me like maybe I am missing some configuration parameters when building? Also, I am struggling to find where the build logs are getting written in order to check in more detail how the build process is doing.

Below, you can find the beginning of the build log that should show the environment details and config options used for build:

– Building version 1.9.0a0+git01b1557
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_GENERATOR=Ninja -DCMAKE_INCLUDE_PATH=F:\TAID-Master\MLAV\pytorch-source\pytorch.jenkins\pytorch\win-test-helpers\installation-helpers\mkl\include -DCMAKE_INSTALL_PREFIX=F:\TAID-Master\MLAV\pytorch-source\pytorch\torch -DCMAKE_PREFIX_PATH=C:\ProgramData\Miniconda3\envs\pytorch-source\Lib\site-packages -DCUDNN_LIBRARY=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\lib\x64 -DNUMPY_INCLUDE_DIR=C:\ProgramData\Miniconda3\envs\pytorch-source\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=C:\ProgramData\Miniconda3\envs\pytorch-source\python.exe -DPYTHON_INCLUDE_DIR=C:\ProgramData\Miniconda3\envs\pytorch-source\include -DPYTHON_LIBRARY=C:\ProgramData\Miniconda3\envs\pytorch-source/libs/python38.lib -DTORCH_BUILD_VERSION=1.9.0a0+git01b1557 -DUSE_NUMPY=True F:\TAID-Master\MLAV\pytorch-source\pytorch
– The CXX compiler identification is MSVC 19.26.28806.0
– The C compiler identification is MSVC 19.26.28806.0
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.26.28801/bin/Hostx64/x64/cl.exe - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.26.28801/bin/Hostx64/x64/cl.exe - skipped
– Detecting C compile features
– Detecting C compile features - done
– Not forcing any particular BLAS to be found
CMake Warning at CMakeLists.txt:305 (message):
TensorPipe cannot be used on Windows. Set it to OFF

– Performing Test COMPILER_WORKS
– Performing Test COMPILER_WORKS - Success
– Performing Test SUPPORT_GLIBCXX_USE_C99
– Performing Test SUPPORT_GLIBCXX_USE_C99 - Success
– Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED
– Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED - Success
– std::exception_ptr is supported.
– Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING
– Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed
– Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS
– Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS - Success
– Current compiler supports avx2 extension. Will build perfkernels.
– Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS
– Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Success
– Current compiler supports avx512f extension. Will build fbgemm.
– Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY
– Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Failed
– Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY
– Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Failed
– Performing Test COMPILER_SUPPORTS_RDYNAMIC
– Performing Test COMPILER_SUPPORTS_RDYNAMIC - Failed
– Building using own protobuf under third_party per request.
– Use custom protobuf build.

– 3.11.4.0
– Looking for pthread.h
– Looking for pthread.h - not found
– Found Threads: TRUE
– Caffe2 protobuf include directory: $<BUILD_INTERFACE:F:/TAID-Master/MLAV/pytorch-source/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>
– Trying to find preferred BLAS backend of choice: MKL
– MKL_THREADING = OMP
– Looking for sys/types.h
– Looking for sys/types.h - found
– Looking for stdint.h
– Looking for stdint.h - found
– Looking for stddef.h
– Looking for stddef.h - found
– Check size of void*
– Check size of void* - done
– Looking for cblas_sgemm
– Looking for cblas_sgemm - found
– MKL libraries: F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/mkl_intel_lp64.lib;F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/mkl_intel_thread.lib;F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/mkl_core.lib;F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/libiomp5md.lib
– MKL include directory: F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/include
– MKL OpenMP type: Intel
– MKL OpenMP library: F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/libiomp5md.lib
– The ASM compiler identification is MSVC
– Found assembler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.26.28801/bin/Hostx64/x64/cl.exe
CMake Deprecation Warning at third_party/googletest/CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.

** AsmJit Summary **
ASMJIT_DIR=F:/TAID-Master/MLAV/pytorch-source/pytorch/third_party/fbgemm/third_party/asmjit
ASMJIT_TEST=FALSE
ASMJIT_TARGET_TYPE=SHARED
ASMJIT_DEPS=
ASMJIT_LIBS=asmjit
ASMJIT_CFLAGS=
ASMJIT_PRIVATE_CFLAGS=-MP;-GF;-Zc:inline;-Zc:strictStrings;-Zc:threadSafeInit-;-W4
ASMJIT_PRIVATE_CFLAGS_DBG=-GS
ASMJIT_PRIVATE_CFLAGS_REL=-GS-;-O2;-Oi
– Using third party subdirectory Eigen.
– Found PythonInterp: C:/ProgramData/Miniconda3/envs/pytorch-source/python.exe (found suitable version β€œ3.8.8”, minimum required is β€œ3.0”)
– Found PythonLibs: C:/ProgramData/Miniconda3/envs/pytorch-source/libs/python38.lib (found suitable version β€œ3.8.8”, minimum required is β€œ3.0”)
– Could NOT find pybind11 (missing: pybind11_DIR)
– Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR)
– Using third_party/pybind11.
– pybind11 include dirs: F:/TAID-Master/MLAV/pytorch-source/pytorch/cmake/…/third_party/pybind11/include
– Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS)
– Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS)
– Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND)
Reason given by package: MPI component β€˜Fortran’ was requested, but language Fortran is not enabled.

CMake Warning at cmake/Dependencies.cmake:1045 (message):
Not compiling with MPI. Suppress this warning with -DUSE_MPI=OFF
Call Stack (most recent call first):
CMakeLists.txt:604 (include)

– Adding OpenMP CXX_FLAGS: -openmp:experimental -IF:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/include
– Will link against OpenMP libraries: F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/libiomp5md.lib
– Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 (found version β€œ11.2”)
– Caffe2: CUDA detected: 11.2
– Caffe2: CUDA nvcc is: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/bin/nvcc.exe
– Caffe2: CUDA toolkit directory: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
– Caffe2: Header version is: 11.2
– Found CUDNN: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/cudnn.lib
– Found cuDNN: v8.1.1 (include: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/include, library: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/cudnn.lib)
– C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/nvrtc.lib shorthash is aa1d5a72
CMake Warning at cmake/public/utils.cmake:365 (message):
In the future we will require one to explicitly pass TORCH_CUDA_ARCH_LIST
to cmake instead of implicitly setting it as an env variable. This will
become a FATAL_ERROR in future version of pytorch.
Call Stack (most recent call first):
cmake/public/cuda.cmake:483 (torch_cuda_get_nvcc_gencode_flag)
cmake/Dependencies.cmake:1150 (include)
CMakeLists.txt:604 (include)

– Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
– Found CUB: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/include
CMake Warning (dev) at third_party/gloo/CMakeLists.txt:21 (option):
Policy CMP0077 is not set: option() honors normal variables. Run β€œcmake
–help-policy CMP0077” for policy details. Use the cmake_policy command to
set the policy and suppress this warning.

For compatibility with older versions of CMake, option is clearing the
normal variable β€˜BUILD_BENCHMARK’.
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) at third_party/gloo/CMakeLists.txt:34 (option):
Policy CMP0077 is not set: option() honors normal variables. Run β€œcmake
–help-policy CMP0077” for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
– MSVC detected
– Set USE_REDIS OFF
– Set USE_IBVERBS OFF
– Set USE_NCCL OFF
– Set USE_RCCL OFF
– Set USE_LIBUV ON
– Only USE_LIBUV is supported on Windows
– Gloo build as SHARED library

– Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 (found suitable version β€œ11.2”, minimum required is β€œ7.0”)
– CUDA detected: 11.2
CMake Warning at cmake/Dependencies.cmake:1394 (message):
Metal is only used in ios builds.
Call Stack (most recent call first):
CMakeLists.txt:604 (include)

Generated: F:/TAID-Master/MLAV/pytorch-source/pytorch/build/third_party/onnx/onnx/onnx_onnx_torch-ml.proto
Generated: F:/TAID-Master/MLAV/pytorch-source/pytorch/build/third_party/onnx/onnx/onnx-operators_onnx_torch-ml.proto
Generated: F:/TAID-Master/MLAV/pytorch-source/pytorch/build/third_party/onnx/onnx/onnx-data_onnx_torch.proto

– ******** Summary ********
– CMake version : 3.19.6
– CMake command : C:/ProgramData/Miniconda3/envs/pytorch-source/Library/bin/cmake.exe
– System : Windows
– C++ compiler : C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.26.28801/bin/Hostx64/x64/cl.exe
– C++ compiler version : 19.26.28806.0
– CXX flags : /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IF:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/include
– Build type : Release
– Compile definitions : WIN32_LEAN_AND_MEAN;TH_BLAS_MKL;_OPENMP_NOFORCE_MANIFEST;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1
– CMAKE_PREFIX_PATH : C:\ProgramData\Miniconda3\envs\pytorch-source\Lib\site-packages;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
– CMAKE_INSTALL_PREFIX : F:/TAID-Master/MLAV/pytorch-source/pytorch/torch
– CMAKE_MODULE_PATH : F:/TAID-Master/MLAV/pytorch-source/pytorch/cmake/Modules;F:/TAID-Master/MLAV/pytorch-source/pytorch/cmake/public/…/Modules_CUDA_fix

– ONNX version : 1.8.0
– ONNX NAMESPACE : onnx_torch
– ONNX_BUILD_TESTS : OFF
– ONNX_BUILD_BENCHMARKS : OFF
– ONNX_USE_LITE_PROTO : OFF
– ONNXIFI_DUMMY_BACKEND : OFF
– ONNXIFI_ENABLE_EXT : OFF

– Protobuf compiler :
– Protobuf includes :
– Protobuf libraries :
– BUILD_ONNX_PYTHON : OFF

– ******** Summary ********
– CMake version : 3.19.6
– CMake command : C:/ProgramData/Miniconda3/envs/pytorch-source/Library/bin/cmake.exe
– System : Windows
– C++ compiler : C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.26.28801/bin/Hostx64/x64/cl.exe
– C++ compiler version : 19.26.28806.0
– CXX flags : /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IF:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/include
– Build type : Release
– Compile definitions : WIN32_LEAN_AND_MEAN;TH_BLAS_MKL;_OPENMP_NOFORCE_MANIFEST;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1
– CMAKE_PREFIX_PATH : C:\ProgramData\Miniconda3\envs\pytorch-source\Lib\site-packages;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
– CMAKE_INSTALL_PREFIX : F:/TAID-Master/MLAV/pytorch-source/pytorch/torch
– CMAKE_MODULE_PATH : F:/TAID-Master/MLAV/pytorch-source/pytorch/cmake/Modules;F:/TAID-Master/MLAV/pytorch-source/pytorch/cmake/public/…/Modules_CUDA_fix

– ONNX version : 1.4.1
– ONNX NAMESPACE : onnx_torch
– ONNX_BUILD_TESTS : OFF
– ONNX_BUILD_BENCHMARKS : OFF
– ONNX_USE_LITE_PROTO : OFF
– ONNXIFI_DUMMY_BACKEND : OFF

– Protobuf compiler :
– Protobuf includes :
– Protobuf libraries :
– BUILD_ONNX_PYTHON : OFF
– Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor
– Adding -DNDEBUG to compile flags
– Checking prototype magma_get_sgeqrf_nb for MAGMA_V2
– Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - True
– Compiling with MAGMA support
– MAGMA INCLUDE DIRECTORIES: F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/magma/include
– MAGMA LIBRARIES: F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/magma/lib/magma.lib
– MAGMA V2 check: 1
– Could not find hardware support for NEON on this machine.
– No OMAP3 processor on this machine.
– No OMAP4 processor on this machine.
– Looking for cpuid.h
– Looking for cpuid.h - not found
– Performing Test NO_GCC_EBX_FPIC_BUG
– Performing Test NO_GCC_EBX_FPIC_BUG - Failed
– Performing Test C_HAS_AVX_1
– Performing Test C_HAS_AVX_1 - Success
– Performing Test C_HAS_AVX2_1
– Performing Test C_HAS_AVX2_1 - Success
– Performing Test CXX_HAS_AVX_1
– Performing Test CXX_HAS_AVX_1 - Success
– Performing Test CXX_HAS_AVX2_1
– Performing Test CXX_HAS_AVX2_1 - Success
– AVX compiler support found
– AVX2 compiler support found
– Performing Test BLAS_F2C_DOUBLE_WORKS
– Performing Test BLAS_F2C_DOUBLE_WORKS - Failed
– Performing Test BLAS_F2C_FLOAT_WORKS
– Performing Test BLAS_F2C_FLOAT_WORKS - Success
– Performing Test BLAS_USE_CBLAS_DOT
– Performing Test BLAS_USE_CBLAS_DOT - Success
– Found a library with BLAS API (mkl). Full path: (F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/mkl_intel_lp64.lib;F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/mkl_intel_thread.lib;F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/mkl_core.lib;F:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/lib/libiomp5md.lib)
– Found a library with LAPACK API (mkl).
disabling ROCM because NOT USE_ROCM is set
– MIOpen not found. Compiling without MIOpen support
– MKLDNN_CPU_RUNTIME = OMP
CMake Deprecation Warning at third_party/ideep/mkl-dnn/CMakeLists.txt:17 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.

Update the VERSION argument value or use a … suffix to tell
CMake that the project does not need compatibility with older versions.

– Intel MKL-DNN compat: set DNNL_ENABLE_CONCURRENT_EXEC to MKLDNN_ENABLE_CONCURRENT_EXEC with value ON
– Intel MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value FALSE
– Intel MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value FALSE
– Intel MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value STATIC
– Intel MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value ``
– Intel MKL-DNN compat: set DNNL_CPU_RUNTIME to MKLDNN_CPU_RUNTIME with value OMP

– Found OpenMP_CXX: -openmp:experimental -IF:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/include
– GPU support is disabled
– Primitive cache is enabled
– Found MKL-DNN: TRUE
– Performing Test C_HAS_THREAD
– Performing Test C_HAS_THREAD - Success
– Version: 7.0.3
– Build type: Release
– CXX_STANDARD: 14
– Performing Test has_std_14_flag
– Performing Test has_std_14_flag - Failed
– Performing Test has_std_1y_flag
– Performing Test has_std_1y_flag - Failed
– Performing Test SUPPORTS_USER_DEFINED_LITERALS
– Performing Test SUPPORTS_USER_DEFINED_LITERALS - Success
– Performing Test FMT_HAS_VARIANT
– Performing Test FMT_HAS_VARIANT - Success
– Required features: cxx_variadic_templates
– Looking for _strtod_l
– Looking for _strtod_l - found
– Not using libkineto in a Windows build.
– CUDA build detected, configuring Kineto with CUPTI support.
– Looking for backtrace
– Looking for backtrace - not found
– Could NOT find Backtrace (missing: Backtrace_LIBRARY Backtrace_INCLUDE_DIR)
– don’t use NUMA
– Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT
– Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT - Failed
– Using ATen parallel backend: OMP
AT_INSTALL_INCLUDE_DIR include/ATen/core
core header install: F:/TAID-Master/MLAV/pytorch-source/pytorch/build/aten/src/ATen/core/TensorBody.h
– NCCL operators skipped due to no CUDA support
– Excluding FakeLowP operators
– Including IDEEP operators
– Excluding image processing operators due to no opencv
– Excluding video processing operators due to no opencv
– MPI operators skipped due to no MPI support
– Include Observer library

–
– ******** Summary ********
– General:
– CMake version : 3.19.6
– CMake command : C:/ProgramData/Miniconda3/envs/pytorch-source/Library/bin/cmake.exe
– System : Windows
– C++ compiler : C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.26.28801/bin/Hostx64/x64/cl.exe
– C++ compiler id : MSVC
– C++ compiler version : 19.26.28806.0
– Using ccache if found : OFF
– CXX flags : /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IF:/TAID-Master/MLAV/pytorch-source/pytorch/.jenkins/pytorch/win-test-helpers/installation-helpers/mkl/include -DNDEBUG -DUSE_FBGEMM -DUSE_XNNPACK
– Build type : Release
– Compile definitions : WIN32_LEAN_AND_MEAN;TH_BLAS_MKL;_OPENMP_NOFORCE_MANIFEST;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;_CRT_SECURE_NO_DEPRECATE=1;MAGMA_V2;IDEEP_USE_MKL;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
– CMAKE_PREFIX_PATH : C:\ProgramData\Miniconda3\envs\pytorch-source\Lib\site-packages;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
– CMAKE_INSTALL_PREFIX : F:/TAID-Master/MLAV/pytorch-source/pytorch/torch

– TORCH_VERSION : 1.9.0
– CAFFE2_VERSION : 1.9.0
– BUILD_CAFFE2 : ON
– BUILD_CAFFE2_OPS : ON
– BUILD_CAFFE2_MOBILE : OFF
– BUILD_STATIC_RUNTIME_BENCHMARK: OFF
– BUILD_TENSOREXPR_BENCHMARK: OFF
– BUILD_BINARY : OFF
– BUILD_CUSTOM_PROTOBUF : ON
– Link local protobuf : ON
– BUILD_DOCS : OFF
– BUILD_PYTHON : True
– Python version : 3.8.8
– Python executable : C:/ProgramData/Miniconda3/envs/pytorch-source/python.exe
– Pythonlibs version : 3.8.8
– Python library : C:/ProgramData/Miniconda3/envs/pytorch-source/libs/python38.lib
– Python includes : C:/ProgramData/Miniconda3/envs/pytorch-source/include
– Python site-packages: Lib/site-packages
– BUILD_SHARED_LIBS : ON
– CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF
– BUILD_TEST : True
– BUILD_JNI : OFF
– BUILD_MOBILE_AUTOGRAD : OFF
– BUILD_LITE_INTERPRETER: OFF
– INTERN_BUILD_MOBILE :
– USE_BLAS : 1
– BLAS : mkl
– USE_LAPACK : 1
– LAPACK : mkl
– USE_ASAN : OFF
– USE_CPP_CODE_COVERAGE : OFF
– USE_CUDA : ON
– Split CUDA : OFF
– CUDA static link : OFF
– USE_CUDNN : ON
– CUDA version : 11.2
– cuDNN version : 8.1.1
– CUDA root directory : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
– CUDA library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/cuda.lib
– cudart library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/cudart_static.lib
– cublas library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/cublas.lib
– cufft library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/cufft.lib
– curand library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/curand.lib
– cuDNN library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/cudnn.lib
– nvrtc : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64/nvrtc.lib
– CUDA include path : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/include
– NVCC executable : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/bin/nvcc.exe
– NVCC flags : -Xcompiler;/w;-w;-Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;–use-local-env;-gencode;arch=compute_75,code=sm_75;-Xcudafe;–diag_suppress=cc_clobber_ignored,–diag_suppress=integer_sign_change,–diag_suppress=useless_using_declaration,–diag_suppress=set_but_not_used,–diag_suppress=field_without_dll_interface,–diag_suppress=base_class_has_different_dll_interface,–diag_suppress=dll_interface_conflict_none_assumed,–diag_suppress=dll_interface_conflict_dllexport_assumed,–diag_suppress=implicit_return_from_non_void_function,–diag_suppress=unsigned_compare_with_zero,–diag_suppress=declared_but_not_referenced,–diag_suppress=bad_friend_decl;–Werror;cross-execution-space-call;–no-host-device-move-forward;-Xcompiler;-MD$<$CONFIG:Debug:d>;–expt-relaxed-constexpr;–expt-extended-lambda;-Xcompiler=/wd4819,/wd4503,/wd4190,/wd4244,/wd4251,/wd4275,/wd4522;-Wno-deprecated-gpu-targets;–expt-extended-lambda;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_BFLOAT16_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
– CUDA host compiler : C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.26.28801/bin/Hostx64/x64/cl.exe
– NVCC --device-c : OFF
– USE_TENSORRT : OFF
– USE_ROCM : OFF
– USE_EIGEN_FOR_BLAS :
– USE_FBGEMM : ON
– USE_FAKELOWP : OFF
– USE_KINETO : OFF
– USE_FFMPEG : OFF
– USE_GFLAGS : OFF
– USE_GLOG : OFF
– USE_LEVELDB : OFF
– USE_LITE_PROTO : OFF
– USE_LMDB : OFF
– USE_METAL : OFF
– USE_PYTORCH_METAL : OFF
– USE_FFTW : OFF
– USE_MKL : ON
– USE_MKLDNN : ON
– USE_MKLDNN_CBLAS : OFF
– USE_NCCL : OFF
– USE_NNPACK : OFF
– USE_NUMPY : ON
– USE_OBSERVERS : ON
– USE_OPENCL : OFF
– USE_OPENCV : OFF
– USE_OPENMP : ON
– USE_TBB : OFF
– USE_VULKAN : OFF
– USE_PROF : OFF
– USE_QNNPACK : OFF
– USE_PYTORCH_QNNPACK : OFF
– USE_REDIS : OFF
– USE_ROCKSDB : OFF
– USE_ZMQ : OFF
– USE_DISTRIBUTED : ON
– USE_MPI : OFF
– USE_GLOO : ON
– USE_TENSORPIPE : OFF
– USE_DEPLOY : OFF
– Public Dependencies : Threads::Threads;caffe2::mkl;caffe2::mkldnn
– Private Dependencies : pthreadpool;cpuinfo;XNNPACK;fbgemm;fp16;gloo;aten_op_header_gen;foxi_loader;fmt::fmt-header-only
– Configuring done
– Generating done

What does torch.cuda.get_arch_list() print?
The error is usually raised, if your current device needs an architecture, which was not used while building PyTorch.
I’m not familiar with Windows builds so don’t know exactly how sm_75 is passed to the build.

Hello @ptrblck , thanks for replying.
The command prints the following:

import torch
torch.cuda.get_arch_list()
[β€˜sm_75’]

Yes, that is my impression as well however, i don’t really know how to pass the needed architecture to the build. I tried using the β€œTORCH_CUDA_ARCH_LIST = 7.5” environment variable to fix it but it seems that it’s not enough…
Do you know how sm_75 is passed on linux? Maybe it’s the same mechanism on windows as well
Anyway, it seems that the command outputs the expected architecture so, how to go from here?

Thank you in advance,

Yes, this looks correct and TORCH_CUDA_ARCH_LIST is also used on Linux.
Could you add 7.0 to the 7.5 architecture and rebuild it?
If this is working, it could point towards an issue with the compute capabilities in the compiler.

Hello @ptrblck, thank you for the suggestion.
I tried also adding sm_70 as you suggested, but the behavior is the same :frowning: :

torch.cuda.get_arch_list()
[β€˜sm_70’, β€˜sm_75’]
torch.randn(1, device=β€œcuda”)
Traceback (most recent call last):
File β€œβ€, line 1, in
RuntimeError: CUDA error: no kernel image is available for execution on the device

That doesn’t seem right. Could you share the exact build command you’ve been using so that I could try to reproduce it locally on a Turing GPU?

I tried different variants and on all of them I ended up with the same results
1.

python setup.py install

python setup.py build --cmake
python setup.py install

python setup.py install --cmake

Variant 3) is the last one I used.

Thank you,

This seems to be a standard build and I’m unable to reproduce the issue:

>>> import torch
>>> torch.version.cuda
'10.2'
>>> torch.cuda.get_arch_list()
['sm_75']
>>> torch.cuda.get_device_name()
'GeForce RTX 2080 Ti'
>>> x = torch.randn(1024, 1024).cuda()
>>> y = torch.matmul(x, x)
>>> print(y.shape)
torch.Size([1024, 1024])
>>> print(y.device)
cuda:0
>>> exit()

Are you able to run any other workload on this GPU? E.g. the CUDA samples or PyTorch via a docker container or the pip wheels / conda binaries?

1 Like

Yes, I was able to run the matrixMul cuda sample:

[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: β€œTuring” with compute capability 7.5

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
Performance= 70.49 GFlop/s, Time= 1.859 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.2\0_Simple\matrixMul../…/bin/win64/Debug/matrixMul.exe (process 14432) exited with code 0.

Unfortunately I can’t test pytorch from a container because this would imply setting up cuda on WSL and for this I need to use the windows preview builds, which I don’t want to do because this is my main machine.

Related to the conda or pip binaries, yes, I have tried them and it’s working but there is a problem there also. Using the prebuild binaries, every time I execute the first instruction on GPU, it takes around 15 minutes to execute it. After that, every succeeding instructions on the same python kernel are execute instantly. This is one of the reasons for which I want to try installing from source.

Appreciate your effort in helping me with this.

Thanks for the update. So it seems to be related to this issue.

I asked this in the other thread already, but are you installing the latest PyTorch version (1.8.1) or the nightly?

As the pre-build conda packages, yes. I currently have pytorch 1.8.1 but in the past I also tried the nightly version and had the same issue.

But, it seems that the issue I am encountering while trying to install from source is different from the problem I have with the conda binaries.

Anyway, main goal is to get an environment on which I would be able to use cuda with pytorch :sweat_smile:

I think they might be related.
The slow startup time using the binaries points towards the CUDA JIT, which would be used, if a compute capability is missing, while the error you are seeing in the source build is also claiming that the expected compute capability is missing for your device.
Are you using any other GPU in this system or only the Turing one?

I use only the Turing one. There is also the intel integrated one but it’s disabled.

I don’t know if this should matter but I also used cuda 10.2 in the past on this same machine but before trying to install pytorch from source I uninstalled it and installed 11.2.

Have some updates.

I tried uninstalling all my cuda stuff and installed cuda 10.2 from scratch. I managed to build pytorch with cuda 10.2. This is where I discovered the following behavior:

import torch
torch.randn(1, device=β€œcuda”)
Traceback (most recent call last):
File β€œβ€, line 1, in
RuntimeError: CUDA error: no kernel image is available for execution on the device
x = torch.randn(1024, 1024).cuda()
print(x.device)
cuda:0
y = torch.matmul(x, x)
Traceback (most recent call last):
File β€œβ€, line 1, in
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasCreate(handle)

So I think the fact I am now using cuda 10.2 has nothing to do with this behaviour. I think it also behaved this way before, when I was on cuda 11.2.

Does this info point in any direction?

Hi, have you tried to uninstall pytorch before pip install or build from source?

pip uninstall torch -y
pip uninstall torch -y
pip uninstall torch -y
pip uninstall torch -y
pip uninstall torch -y
conda uninstall pytorch -y
conda uninstall pytorch -y
conda uninstall pytorch -y
conda uninstall pytorch -y
conda uninstall pytorch -y

Also, if you build from source, it’s also nice to have python setup.py clean a few times before python setup.py install as well.

Hello,

Yes, I created a new conda environment and also ran python setup.py clean before doing install.
I give up and will switch to a linux distribution… hope the binaries work there. :slight_smile: