Torch.cuda.is_available() is False for CUDA version 11.4

Hi All,

We are stuck with pytorch installation on server. Below are the collect.py details:
(collect.py reference : https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)

(gpu_env) python collect.py

Collecting environment information…
/opt/platformx/sentiment_analysis/gpu_env/lib64/python3.8/site-packages/torch/cuda/init.py:82: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at …/c10/cuda/CUDAFunctions.cpp:112.)
return torch._C._cuda_getDeviceCount() > 0
PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Red Hat Enterprise Linux 8.6 (Ootpa) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.8.12 (default, Sep 16 2021, 10:46:05) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] (64-bit runtime)
Python platform: Linux-4.18.0-372.13.1.el8_6.x86_64-x86_64-with-glibc2.2.5
Is CUDA available: False
CUDA runtime version: 11.4.48
GPU models and configuration: GPU 0: GRID M6-4Q
Nvidia driver version: 470.82.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.1
[pip3] torch==1.11.0+cu113
[conda] Could not collect

What all we have tried:
-Installing torch==1.11.0+cu113, torch==1.12.0+cu113, torch==1.11.0+cu102, torch==1.12.0+cu102.
-Installing from .whl files for python 3.8 and cu113
-Upgrading pip and pip3
-Tried a fresh virtual enviroenment.

We know two other ways, but not sure if it would work:

  1. Downgrading CUDA version from 11.4 to 11.3
  2. Building pytroch for CUDA 11.4 from source.

We cannot use Anaconda as well, only pip is allowed.

The above methods require sudo permissions that we don’t have. So, it would be better if anyone can suggest alternatives or better solutions.

Thanks

nvidia-smi output :
nvidia_smi

PyTorch is unable to communicate with the GPU as its initialization is failing:

UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. 

In case you have installed the drivers recently, make sure to reboot the node.

Thanks for the reply,
We tried restarting the server, but it still shows the same issue.
Any other way?