PyTorch with CUDA10.2 on 2080Ti - no CUDA-capable device is detected

ViennaFlying · December 14, 2023, 2:16am

Hi,

I try to have a working Windows 10 Installation of the “CUDA Toolkit 10.2.89” with my “RTX 2080Ti” graphics card.

I understand that for only consuming CUDA runtime services within other 3rd party apps like PyTorch, I only would need the Win 10 Nvidia driver, as PyTorch brings its own CUDA runtime.

Unfortunately that did not work with PyTorch, even when I have a PyTorch version installed with CU102.

I use miniconda3 and Python 3.9.18 and installed PyTorch with following command
pip install torch==1.10.0+cu102 torchvision==0.11.0+cu102 -f https://download.pytorch.org/whl/torch_stable.html

(TT) C:\Users\Roman>pip list
Package Version

cupy-cuda102 12.3.0
fastrlock 0.8.2
numpy 1.26.2
Pillow 10.1.0
pip 23.3.1
setuptools 69.0.2
torch 1.10.0+cu102
torchvision 0.11.0+cu102
typing_extensions 4.9.0
wheel 0.42.0

python -c “from torch.utils.cpp_extension import CUDA_HOME; print(CUDA_HOME)” → No CUDA runtime is found, using CUDA_HOME=‘C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2’
python -c “import torch; print(torch.version)” → 1.10.0+cu102
python -c “import torch; print(torch.torch.path)” → [‘C:\Users\Roman\miniconda3\envs\TT\lib\site-packages\torch’]
python -c “import torch; print(torch.version.cuda)” → 10.2
python -c “import torch; print(torch.backends.cudnn.version())” → 7605
python -c “import torch; print(torch.cuda.is_available())” → False (expect True)
python -c “import torch; print(torch.cuda.device_count())” → 0 (expect 1)
python -c “import torch; print(torch.cuda.get_device_name(0))” → RuntimeError: No CUDA GPUs are available (expect “GeForce RTX 2080 Ti”)
python -c “import torch; torch.cuda.set_device(0)” → RuntimeError: No CUDA GPUs are available
python -c “import torch; print(torch.cuda.current_device())” → RuntimeError: No CUDA GPUs are available (expect 0)
python -c “import torch; print(torch.zeros(1).cuda())” → RuntimeError: No CUDA GPUs are available

So PyTorch can not find my CUDA GPU with the error “No CUDA GPUs are available”.

I have tried that with many different Nvidia drivers with all the same results. Currently I use the latest Nvidia driver 546.29.

So I then tried to focus on the CUDA Toolkit only, and according to the CUDA wiki, I installed the CUDA Toolkit 10.2.89. This was a 3-part installation of “cuda_10.2.89_win10_network.exe”, “cuda_10.2.1_win10.exe” and “cuda_10.2.2_win10.exe”.
I also installed the recommended “cuDNN 8.7.0 for CUDA 10.2” according to NVidia docs as “cudnn-windows-x86_64-8.7.0.84_cuda10-archive.zip” and extracted the "bin, “include”, “lib\x64” files into the corresponding “CUDA\v10.2” subfolders “bin”, “include” and “lib\x64”.

The RTX 2080Ti is a “Turing” card with a Compute Capability of “7.5” and should have support for CUDA Toolkit SDK 10.0 - 10.2.

Acording to NVidia docs the minimal “Windows x86_64 Driver Version” for CUDA Toolkit “CUDA 10.2.89” must be >= 441.22
I have installed the latest Desktop Win10 Nvidia driver 546.29, which is also shown in the output of “nvidia-smi.exe”:
C:>nvidia-smi.exe
Wed Dec 13 20:32:02 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 546.29 Driver Version: 546.29 CUDA Version: 12.3 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti WDDM | 00000000:0E:00.0 On | N/A |
| 0% 47C P8 26W / 300W | 2280MiB / 11264MiB | 0% Default |
| | | N/A |

The installed CUDA Toolkit is reported as V10.2.89:
C:>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.2, V10.2.89

When I try to run “deviceQuery.exe” I also get a “no CUDA-capable device is detected” error:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\extras\demo_suite>deviceQuery.exe
deviceQuery.exe Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
→ no CUDA-capable device is detected
Result = FAIL

I also checked within “NVIDIA Control Panel” → “Manage 3D Settings” → “Global Settings” that the Setting “CUDA - GPUs” is set to “All” and the checkbox for “NVIDIA GeForce RTX 2080 Ti” is checked.
I only have this ONE graphics card in this PC, and there are no IGPUs from the processor, which is a “AMD Ryzen 7 3700X 8-Core Processor”.
Windows 10 Pro 22H2 19045.3803, Windows Feature Experience Pack 1000.19053.1000.0

Environment Variables (only list the variables relevant for this problem):
C:\SET
CommonProgramFiles=C:\Program Files\Common Files
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramW6432=C:\Program Files\Common Files
CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
CUDA_PATH_V10_2=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
CUDNN_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
DriverData=C:\Windows\System32\Drivers\DriverData
HOMEDRIVE=C:
HOMEPATH=\Users\Roman
LD_LIBRARY_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64
LOCALAPPDATA=C:\Users\Roman\AppData\Local
NUMBER_OF_PROCESSORS=16
NVCUDASAMPLES10_2_ROOT=C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2
NVCUDASAMPLES_ROOT=C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.2
NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
OS=Windows_NT
Path=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;c:\windows\system32;c:\windows;c:\windows\system32\wbem;c:\windows\system32\windowspowershell\v1.0;c:\windows\system32\openssh;c:\windows\system32;c:\program files\dotnet;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Git\cmd;C:\Program Files\NVIDIA Corporation\Nsight Compute 2019.5.0;C:\Users\Roman\AppData\Local\Programs\Python\Python39\Scripts;C:\Users\Roman\AppData\Local\Programs\Python\Python39;C:\Users\Roman\AppData\Local\Microsoft\WindowsApps;C:\Users\Roman\AppData\Local\Programs\Microsoft VS Code\bin
PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
PROCESSOR_ARCHITECTURE=AMD64
PROCESSOR_IDENTIFIER=AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
PROCESSOR_LEVEL=23
PROCESSOR_REVISION=7100
ProgramData=C:\ProgramData
ProgramFiles=C:\Program Files
ProgramFiles(x86)=C:\Program Files (x86)
ProgramW6432=C:\Program Files
SystemDrive=C:
SystemRoot=C:\WINDOWS
TEMP=C:\Users\Roman\AppData\Local\Temp
TMP=C:\Users\Roman\AppData\Local\Temp
windir=C:\WINDOWS

I really run out of ideas what can be wrong with my setup?
My problem seems NOT to be related to Python or PyTorch, as also from a pure “NVIDIA Developer view” it does not work.
What else I can check?

Greetings,
Roman (from Vienna)

ptrblck · December 14, 2023, 2:14pm

I agree as no application is able to find the device. Was this setup ever working? If so, what changed? Since the issue is not related to PyTorch I would recommend asking in e.g. the NVIDIA board additionally for more troubleshooting.

ViennaFlying · December 15, 2023, 1:52am

Hi @ptrblck , thanks for coming back to me. No, this setup was never working on this windows installation/card. I already asked about my problem on the Nvidia developer forum Cuda 10.2 on 2080Ti - no CUDA-capable device is detected - CUDA Setup and Installation - NVIDIA Developer Forums

But as you seem to be an expert, I want to ask you to clarify some PyTorch questions for me, with regards to CUDA.

A)
As I understand it, PyTorch brings its own cuda runtime, if you use a PyTorch version which is compiled with CUDA support (like the PyTorch+cuXXX versions).
Does that mean for such a PyTorch version I do NOT need to install the Nvidia Cuda Toolkit, and I only need a recent Nvidia Video Driver installed, which brings the Nvidia side of the Cuda runtime?
I only would need the Cuda Toolkit if I want to compile my own version of PyTorch with cuda support?

B)
Can I only use such a PyTorch version, which is compiled with a supported CUDA SDK version matching my Nvidia’s Architecture and Compute Capabilities, or is there a backward capability?
My Geforce RTX 2080Ti has a “Turing” architecture and is listed with a Compute Capability of “7.5”.
According to CUDA - Wikipedia the Cuda SDK Support for my card then would be 10.0 - 10.2.
Does that mean I only can use PyTorch which is compiled against Cuda 10.0 / 10.1 / 10.2, or can I also use e.g. “PyTorch+Cuda11.8” or “PyTorch+Cuda12.1” and that would be backwards compatible for 10.x?
Or does the PyTorch compiled cuda version need to exactly match (and if yes against what version of which component on my windows system)? If I not need to have the Cuda runtime installed then against which component would it need to match?

Thanks for your clarifying answers,
Roman

ptrblck · December 15, 2023, 5:02am

To A) Yes, your explanation is correct and the PyTorch binaries with CUDA support will install all needed CUDA runtime dependencies. You would only need to install an NVIDIA driver which supports the corresponding CUDA version.

To B)

The support matrix shows also that all CUDA 11 and 12 versions still support compute capability 7.5 and thus your Turing architecture.

You can use any current stable or nightly PyTorch binary with CUDA 11.8 and 12.1 as we are explicitly building PyTorch for Turing GPUs in these binaries. The binaries support compute capability 3.7 - 9.0 if PyTorch+CUDA 11.8 is installed and 5.0 - 9.0 if PyTorch+CUDA 12.1 is installed.
We mostly don’t use backwards compatibility but build for these architectures explicitly. Compute capability 8.9 is an exception as it’s binary compatible to 8.6 as well as 8.0 and explicitly building for it won’t give you any advantage besides increasing the binary size.

ViennaFlying · December 15, 2023, 5:56am

Thanks for the comprehensive answers.

So then basically with a recent Nvidia driver I should be able to run any PyTorch+cuXXX binary with my 2080Ti.
Unfortunately I still face the error “RuntimeError: No CUDA GPUs are available”, regardless what I try.

I already installed several Nvidia drivers (clean install), always with removing any previous driver with DDU. I also checked there were no stale contents in the registry or within the C:\Windows\System32\DriverStore\FileRepository directory.

My Computer has no internal iGPU processor.

Do you have any ideas what else could be the problem for this error?

Thanks,
Roman

ptrblck · December 15, 2023, 2:37pm

Not really as I’m not using Windows. In the past users were seeing similar issues when their laptop turned off the dedicated GPU to save power, but unsure if this could be related.