torch is unable to detect cuda for driver 470.182.03

Describe the bug

torch (2.1.1+cu121) is unable to detect Cuda for driver 470.182.03

import torch
torch.cuda.is_available()

output:

/home/ml/virtualenv/lib/python3.10/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

Expected output:

True

Versions

torch – 2.0.1+cu117
CUDA Version – 12.1
Driver Version – 470.182.03

nvidia-smi output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 12.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1B.0 Off |                    0 |
|  0%   16C    P8    15W / 300W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Can someone please advice ?

Your NVIDIA driver is too old for CUDA 12.x unless you explicitly enabled CUDA’s forward compatibility.
Update the driver and it should work.

Thanks @ptrblck for quick check on this.

I have explicitly enabled the Cuda forward compatibility. But still it’s same.

If you depend on forward compatibility, you could use the NGC containers as they support it and manually enabling it might not always be trivial.

1 Like