No sudo access - CUDA driver initialization failed, you might not have a CUDA gpu

Hi folks, I don’t have sudo access and contacting sys-admin takes a non trivial amount of time.
I’ve access to two remote clusters.
Cluster 1 :
output of a script which prints PyTorch version, CUDA version (if applicable) - otherwise prints
CUDA not available, OS, python version

PyTorch version: 2.2.2+cu118
/home/user_name/anaconda3/envs/llm2/lib/python3.10/site-packages/torch/cuda/__init__.py:141: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
CUDA Available: False
CUDA not available
System OS: Linux 4.18.0-514.el8.x86_64
Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]

nvcc -V returns

 nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

nvidia-smi

> +-----------------------------------------------------------------------------------------+
> | NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
> |-----------------------------------------+------------------------+----------------------+
> | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
> | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
> |                                         |                        |               MIG M. |
> |=========================================+========================+======================|
> |   0  NVIDIA RTX A6000               Off |   00000000:1C:00.0 Off |                  Off |
> | 30%   33C    P8             19W /  300W |      23MiB /  49140MiB |      0%      Default |
> |                                         |                        |                  N/A |
> +-----------------------------------------+------------------------+----------------------+
> |   1  NVIDIA RTX A6000               Off |   00000000:1E:00.0 Off |                  Off |
> | 30%   33C    P8             20W /  300W |      11MiB /  49140MiB |      0%      Default |
> |                                         |                        |                  N/A |
> +-----------------------------------------+------------------------+----------------------+
> |   2  NVIDIA RTX A6000               Off |   00000000:3D:00.0 Off |                  Off |
> | 30%   32C    P8             27W /  300W |      11MiB /  49140MiB |      0%      Default |
> |                                         |                        |                  N/A |
> +-----------------------------------------+------------------------+----------------------+
> |   3  NVIDIA RTX A6000               Off |   00000000:3E:00.0 Off |                  Off |
> | 30%   35C    P8             25W /  300W |      11MiB /  49140MiB |      0%      Default |
> |                                         |                        |                  N/A |
> +-----------------------------------------+------------------------+----------------------+
> |   4  NVIDIA RTX A6000               Off |   00000000:3F:00.0 Off |                 Off* |
> |ERR!   49C    P5            ERR! /  300W |      11MiB /  49140MiB |      0%      Default |
> |                                         |                        |                  N/A |
> +-----------------------------------------+------------------------+----------------------+
> |   5  NVIDIA RTX A6000               Off |   00000000:40:00.0 Off |                  Off |
> | 30%   32C    P8              8W /  300W |      11MiB /  49140MiB |      0%      Default |
> |                                         |                        |                  N/A |
> +-----------------------------------------+------------------------+----------------------+
> |   6  NVIDIA RTX A6000               Off |   00000000:41:00.0 Off |                  Off |
> | 30%   31C    P8             16W /  300W |      11MiB /  49140MiB |      0%      Default |
> |                                         |                        |                  N/A |
> +-----------------------------------------+------------------------+----------------------+
> |   7  NVIDIA RTX A6000               Off |   00000000:5E:00.0 Off |                  Off |
> | 30%   29C    P8              7W /  300W |      11MiB /  49140MiB |      0%      Default |
> |                                         |                        |                  N/A |
> +-----------------------------------------+------------------------+----------------------+
>                                                                                          
> +-----------------------------------------------------------------------------------------+
> | Processes:                                                                              |
> |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
> |        ID   ID                                                               Usage      |
> |=========================================================================================|
> |    0   N/A  N/A      4216      G   /usr/libexec/Xorg                               9MiB |
> |    0   N/A  N/A      4466      G   /usr/bin/gnome-shell                            4MiB |
> |    1   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
> |    2   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
> |    3   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
> |    4   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
> |    5   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
> |    6   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
> |    7   N/A  N/A      4216      G   /usr/libexec/Xorg                               4MiB |
> +-----------------------------------------------------------------------------------------+

On the 2nd cluster
output of a script which prints PyTorch version, CUDA version (if applicable) - otherwise prints
CUDA not available, OS, python version

PyTorch version: 2.2.2
/home/user_name/.conda/envs/llm/lib/python3.10/site-packages/torch/cuda/__init__.py:141: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at /opt/conda/conda-bld/pytorch_1711403380909/work/c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
CUDA Available: False
CUDA not available
System OS: Linux 3.10.0-1160.36.2.el7.x86_64
Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]

nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:18:00.0 Off |                  N/A |
| 31%   32C    P8     1W / 250W |      0MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:3B:00.0 Off |                  N/A |
| 31%   32C    P8     9W / 250W |      0MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:86:00.0 Off |                  N/A |
| 31%   35C    P8    19W / 250W |      0MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:AF:00.0 Off |                  N/A |
| 31%   34C    P8     1W / 250W |      3MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+```

PyTorch seems to have trouble with your NVIDIA driver so you might need to reinstall the driver.

is there a preferred/stable version which i should install ? 12.4 on nvcc is pretty cutting edge

550.54.15 or 535.161.08 should both work fine.