After nvidia driver update cuda is disabled

hi, I’ve updated my old driver. after that torch cuda is disabled with this error code. Is there any way to know where the issue came from? Please give any tips.

below is my current cuda versions.

python3
Python 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/home/d/anaconda3/lib/python3.11/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA TITAN Xp     On   | 00000000:1B:00.0 Off |                  N/A |
| 23%   22C    P8     8W / 250W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA TITAN Xp     On   | 00000000:1C:00.0 Off |                  N/A |
| 23%   22C    P8     8W / 250W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA TITAN Xp     On   | 00000000:1D:00.0 Off |                  N/A |
| 23%   24C    P8     9W / 250W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA TITAN Xp     On   | 00000000:1E:00.0 Off |                  N/A |
| 23%   23C    P8     9W / 250W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA TITAN Xp     On   | 00000000:3D:00.0 Off |                  N/A |
| 23%   21C    P8     8W / 250W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA TITAN Xp     On   | 00000000:3F:00.0 Off |                  N/A |
| 23%   20C    P8     8W / 250W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA TITAN Xp     On   | 00000000:40:00.0 Off |                  N/A |
| 23%   23C    P8     8W / 250W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA TITAN Xp     On   | 00000000:41:00.0 Off |                  N/A |
| 23%   20C    P8     9W / 250W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
grep 'CUDNN_MAJOR\|CUDNN_MINOR\|CUDNN_PATCHLEVEL' /usr/include/cudnn_version.h
#define CUDNN_MAJOR 9
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 0
#define CUDNN_VERSION (CUDNN_MAJOR * 10000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
pip list | grep torch
torch                         2.1.1+cu121
torch-tb-profiler             0.4.3
torchaudio                    2.1.1
torchlibrosa                  0.1.0
torchvision                   0.16.1+cu121

Did you restart your workstation after the driver update? If so, are you able to run any CUDA sample?

I have rebooted several times, cuda sample also shows error

sudo modprobe nvidia

*~**$ /usr/local/cuda-12.1/extras/demo_suite/deviceQuery

/usr/local/cuda-12.1/extras/demo_suite/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 804

-> forward compatibility was attempted on non supported HW

there is no /usr/local/cuda-12.1/libcudnn* file. is that could be issue?

or could it be kernel issue?
Operating System: Ubuntu 20.04.6 LTS
Kernel: Linux 5.4.0-122-generic
Architecture: x86-64

No, cuDNN and PyTorch are unrelated to your issue as it seems you are not able to initialize your NVIDIA driver, so I would recommend reinstalling it.

oh I totally removed all driver, cuda toolkit and reinstalled driver 535 and it works now.
definitely it was driver problem… thank you!

Driver Version: 535.161.07 CUDA Version: 12.2

1 Like