Torch.cuda.is_available() is False even though cuda is installed

patakk · January 28, 2020, 6:30pm

I’m working on a google cloud instance with Ubuntu 16.04 installed. I’ve installed cuda (on Tesla K80) using the recommended steps and got nvidia-smi working (it says CUDA version is 10.2 with driver version 440.33.01).

I’ve tried installing several versions of PyTorch, but torch.cuda.is_available() still returns False. I’ve checked the topics on this forum regarding this problem, but they have been of no help and I haven’t actually understood what causes the problem.

Any thoughts?

ptrblck · January 28, 2020, 7:49pm

How did you install PyTorch? Could you post the log of the installation, please?

patakk · January 29, 2020, 4:18pm

I’m working on a Google cloud instance where I’m hoping to run a Docker container, so I’ve tried several ways of doing that.

I’ve figured out that if I don’t use Docker at all and install PyTorch (and condatoolkit==10.1) with conda, it suddenly works, but even if I used the same method of installation of PyTorch (with conda) inside Docker, it again returns torch.cuda.is_available() to be False. As for the image I’m building my Docker image from, I’ve tried several ones (like a PyTorch image and some other images that claim to have cuda 10.1 installed).

The only new info that I have is that it works outside Docker, but not inside (a problem I’ve never had at my work servers)…

I can try installing it again inside Docker to reproduce the logs, but I’m not sure installation is the problem, but rather some Docker<->Nvidia problems… although I’ve never had those before.

ptrblck · January 29, 2020, 4:23pm

How do you start your container?
Are you using nvidia-docker or docker run -it --gpus all ...?

patakk · January 29, 2020, 5:12pm

Okay, I got it working finally…

The problem was that I didn’t install the nvidia-container-toolkit. Since I’ve never dealt with this sort of stuff at my work (the devops guy did), I’ve never known this was a necessary step to make it work.

Anyway, thank you for your help, but this was never a PyTorch issue it seems.

kaiseryet · January 29, 2020, 6:20pm

This happens to me as I was using cuda 10.2 and updating pytorch through conda. Reinstall torch (the entire environment) can help, though not efficient.

patakk · January 29, 2020, 7:26pm

@kaiseryet

Yeah, I did multiple reinstallations to no avail, but as I said, installing the nvidia-container-toolkit fixed the issue because my issue was accessing the GPU through docker. It took me time to figure this one out because torch.version.cuda==10.1 and torch.backends.cudnn.enabled==True led me to believe Docker wasn’t the problem.

terminator9487 · September 28, 2020, 2:49am

Hi,everyone. The same error occurred to me recently.

host environment info：Ununtu 16.04 with the following cuda installed

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 26%   44C    P0    57W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 20%   39C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:82:00.0 Off |                  N/A |
| 20%   39C    P0    58W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 20%   37C    P0    56W / 250W |      0MiB / 11178MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

the result of nvcc -v was:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

the dock info:

Docker version 19.03.12, build 48a66213fe

I was running the docker container by running

docker run --runtime=nvidia -it --shm-size 8G --name="shm_updated" --gpus 2    braindecode:1.0 /bin/bash

Finally, the result of torch.cuda.is_available() is False, and torch.version.cuda==10.2 && torch.backends.cudnn.enabled==true,which was strange that the nvidia-smi could show the GPU info inside of container.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 25%   42C    P0    56W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 19%   38C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Unfortunately, it couldn’t be solved by installing the nvidia-container-toolkit.I am stuck here. Does anyone have any suggestion about this issue?

PLUS: the above container was built by my own, and I tried the official CUDA container with

docker run --runtime=nvidia -it --rm nvidia/cuda:9.0-runtime-ubuntu16.04 nvidia-smi

the result seems no problem

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 30%   47C    P0    57W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 24%   42C    P0    60W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:82:00.0 Off |                  N/A |
| 22%   42C    P0    59W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 21%   40C    P0    56W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

YixuanWei · December 14, 2020, 2:38pm

Same problem!
Have you solved it?