Torch.is_available is False on Cuda 11.7

VictorJouault · June 1, 2023, 3:21am

Common issue, but I really cannot root cause this one.

Trying to set up PyTorch + Cuda on an AWS p3 instance (Nvidia Tesla V100 GPUs).

Torch output:

torch.__version__               # 2.0.1+cu117
torch.cuda.device_count()       # --> 0
torch.cuda.is_available()       # --> False
torch.version.cuda              # --> 11.7
torch.backends.cudnn.version()  # 8500
torch.zeros(1).cuda()           # “RuntimeError: Found no NVIDIA driver on your system”

pip list output:

Package                  Version
------------------------ ----------
cmake                    3.26.3
filelock                 3.12.0
Jinja2                   3.1.2
lit                      16.0.5
MarkupSafe               2.1.2
mpmath                   1.3.0
networkx                 3.1
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
pip                      22.3.1
setuptools               65.6.3
sympy                    1.12
torch                    2.0.1
triton                   2.0.0
typing_extensions        4.6.2
wheel                    0.40.0

nvidia-smi output

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:17.0 Off |                    0 |
| N/A   33C    P0    57W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:18.0 Off |                    0 |
| N/A   32C    P0    56W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:00:19.0 Off |                    0 |
| N/A   33C    P0    56W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   34C    P0    55W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   33C    P0    55W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2...  Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   32C    P0    56W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   32C    P0    58W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   33C    P0    55W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvcc --version output

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

Also, I’ve downloaded the Nvidia samples using git clone https://github.com/NVIDIA/cuda-samples.git --branch v11.6 and then ran them using make, and they all seem to run fine …

Thanks.

Edit:

ChatGPT suggested I check the permissions, so here is the output although I don’t find anything specific:

Running ls -l /dev/nvidia*

crw-rw-rw- 1 root root   195,   0 Jun  1 02:56 /dev/nvidia0
crw-rw-rw- 1 root root   195,   1 Jun  1 02:56 /dev/nvidia1
crw-rw-rw- 1 root root   195,   2 Jun  1 02:56 /dev/nvidia2
crw-rw-rw- 1 root root   195,   3 Jun  1 02:56 /dev/nvidia3
crw-rw-rw- 1 root root   195,   4 Jun  1 02:56 /dev/nvidia4
crw-rw-rw- 1 root root   195,   5 Jun  1 02:56 /dev/nvidia5
crw-rw-rw- 1 root root   195,   6 Jun  1 02:56 /dev/nvidia6
crw-rw-rw- 1 root root   195,   7 Jun  1 02:56 /dev/nvidia7
crw-rw-rw- 1 root root   195, 255 Jun  1 02:56 /dev/nvidiactl

/dev/nvidia-caps:
total 0
cr-------- 1 root root 248, 1 Jun  1 02:56 nvidia-cap1
cr--r--r-- 1 root root 248, 2 Jun  1 02:56 nvidia-cap2

Also checked the CUDA_HOME variable: echo $CUDA_HOME now returns /usr/local/cuda-11.7 (similar path to which nvcc that returns /usr/local/cuda-11.7/bin/nvcc)
If that’s relevant, I’m running Python 3.9 in a virtual environment created specifically for this.

eqy · June 1, 2023, 3:40am

Are you using a docker container and with the required arguments such as docker run --gpus all? Otherwise, what is the driver version reported by nvidia-smi?

VictorJouault · June 1, 2023, 3:46am

I’m not running it in a container, just from a python script.

I also copied the output of the nvidia-smi command above, the driver version seems to be 515.105.01.

VictorJouault · June 1, 2023, 9:53pm

I found out the problem. My Python installation, on Linux Amazon, was made through LinuxBrew, and it looks like it wasn’t system-wide. I reinstalled Python using pyenv and it now works like a charm.