Hello, I am trying to test our model on a new physical host (Ubuntu 22.04) with one A30 GPU available. But encoutered this weird issues.
-
We followed the instruction to install the pytorch compatble with cuda 12.6: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
-
And following the official instruction to install cuda 12.6 and nvidia driver (560)
-
nvidia-smi and nvcc both works and showed as
nvidia-smi
Fri Apr 4 18:07:55 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A30 Off | 00000000:25:00.0 Off | On |
| N/A 26C P0 26W / 165W | 1MiB / 24576MiB | N/A Default |
| | | Enabled |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| MIG devices: |
±-----------------±---------------------------------±----------±----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| No MIG devices found |
±----------------------------------------------------------------------------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+
- however, pytorch torch.cuda.device_count() return 1 but torch.cuda.is_available() as False and failed to torch._C._cuda_init() with “No CUDA GPUs are available” error raised
Any idea on this? thanks