hi, I’ve updated my old driver. after that torch cuda is disabled with this error code. Is there any way to know where the issue came from? Please give any tips.
below is my current cuda versions.
python3
Python 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/home/d/anaconda3/lib/python3.11/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA TITAN Xp On | 00000000:1B:00.0 Off | N/A |
| 23% 22C P8 8W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA TITAN Xp On | 00000000:1C:00.0 Off | N/A |
| 23% 22C P8 8W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA TITAN Xp On | 00000000:1D:00.0 Off | N/A |
| 23% 24C P8 9W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA TITAN Xp On | 00000000:1E:00.0 Off | N/A |
| 23% 23C P8 9W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA TITAN Xp On | 00000000:3D:00.0 Off | N/A |
| 23% 21C P8 8W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA TITAN Xp On | 00000000:3F:00.0 Off | N/A |
| 23% 20C P8 8W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA TITAN Xp On | 00000000:40:00.0 Off | N/A |
| 23% 23C P8 8W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA TITAN Xp On | 00000000:41:00.0 Off | N/A |
| 23% 20C P8 9W / 250W | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
grep 'CUDNN_MAJOR\|CUDNN_MINOR\|CUDNN_PATCHLEVEL' /usr/include/cudnn_version.h
#define CUDNN_MAJOR 9
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 0
#define CUDNN_VERSION (CUDNN_MAJOR * 10000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
pip list | grep torch
torch 2.1.1+cu121
torch-tb-profiler 0.4.3
torchaudio 2.1.1
torchlibrosa 0.1.0
torchvision 0.16.1+cu121