Error:" CUDA driver initialization failed, you might not have a CUDA gpu." on an HPC

I was trying to run rstar main.py for testing purpose on an HPC and i got this error

i tried some debugging step suggested by chatgpt

(rstar) [ingenx@rdgpu01 ~]$ nvidia-smi
Wed Feb 12 13:35:14 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          Off | 00000000:3B:00.0 Off |                   On |
| N/A   51C    P0              68W / 300W |      0MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          Off | 00000000:5E:00.0 Off |                   On |
| N/A   56C    P0              79W / 300W |      0MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                |        ECC|                       |
|==================+================================+===========+=======================|
|  No MIG devices found                                                                 |
+---------------------------------------------------------------------------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
(rstar) [ingenx@rdgpu01 ~]$ nvcc --version
-bash: nvcc: command not found
(rstar) [ingenx@rdgpu01 ~]$ python3 -c "import torch; print('CUDA Available:', torch.cuda.is_available())"
/home/ingenx/miniconda3/envs/rstar/lib/python3.11/site-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
CUDA Available: False
(rstar) [ingenx@rdgpu01 ~]$ python3 -c "import torch; print('Number of GPUs:', torch.cuda.device_count())"
Number of GPUs: 2
(rstar) [ingenx@rdgpu01 ~]$ python3 -c "import torch; print('GPU Name:', torch.cuda.get_device_name(0))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ingenx/miniconda3/envs/rstar/lib/python3.11/site-packages/torch/cuda/__init__.py", line 493, in get_device_name
    return get_device_properties(device).name
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ingenx/miniconda3/envs/rstar/lib/python3.11/site-packages/torch/cuda/__init__.py", line 523, in get_device_properties
    _lazy_init()  # will define _get_device_properties
    ^^^^^^^^^^^^
  File "/home/ingenx/miniconda3/envs/rstar/lib/python3.11/site-packages/torch/cuda/__init__.py", line 319, in _lazy_init
    torch._C._cuda_init()
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.


i was not able to find any solution on internet about this problem

Deactivate MIG or make sure the appropriate MIG slice is visible in the process.