Found no NVIDIA driver on your system (but it's there!)

vdw · January 19, 2019, 5:55am

I’ve added an GeForce GTX 1080 Ti into my machine (Running Ubuntu 18.04 and Anaconda with Python 3.7) to utilize the GPU when using PyTorch. Both cards a correctly identified:

$ lspci | grep VGA
03:00.0 VGA compatible controller: NVIDIA Corporation GF119 [NVS 310] (reva1)
04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)

The NVS 310 handles my 2-monitor setup, I only want to utilize the 1080 for PyTorch. I also installed the latest NVIDIA drivers that are currently in the repository and that seems to be fine:

$ nvidia-smi 
Sat Jan 19 12:42:18 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87                 Driver Version: 390.87                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVS 310             Off  | 00000000:03:00.0 N/A |                  N/A |
| 30%   60C    P0    N/A /  N/A |    461MiB /   963MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:04:00.0 Off |                  N/A |
|  0%   41C    P8    10W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Driver version 390.xx allows to run CUDA 9.1 (9.1.85) according the the NVIDIA docs. Since this is also the version in the Ubuntu repositories, I simple installed the CUDA Toolkit with:

$ sudo apt-get-installed nvidia-cuda-toolkit

And again, this seems be alright:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

and

$ apt-cache policy nvidia-cuda-toolkit
nvidia-cuda-toolkit:
  Installed: 9.1.85-3ubuntu1
  Candidate: 9.1.85-3ubuntu1
  Version table:
 *** 9.1.85-3ubuntu1 500
        500 http://sg.archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages
        100 /var/lib/dpkg/status

Lastly, I’ve installed PyTorch from scratch with conda

conda install pytorch torchvision -c pytorch

Also error as far as I can tell:

$ conda list
...
pytorch                   1.0.0           py3.7_cuda9.0.176_cudnn7.4.1_1    pytorch
...

However, PyTorch doesn’t seem to find CUDA:

$ python -c 'import torch; print(torch.cuda.is_available())'
False

In more detail, if I force PyTorch to convert a tensor x to CUDA with x.cuda() I get the error:

Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from 82 http://...

What am I’m missing here? I’m new to this, but I think I’ve checked the Web already quite a bit to find any caveats like NVIDIA driver and CUDA toolkit versions?

EDIT: Some more outputs from PyTorch:

print(torch.cuda.device_count())   # --> 0
print(torch.cuda.is_available())   # --> False
print(torch.version.cuda)          # --> 9.0.176

guiyanzhong · January 20, 2019, 6:06am

There is a version mismatch. Your installed CUDA version is 9.1. However, your PyTorch package is compiled with CUDA 9.0.

Possible solutions:

Install CUDA 9.0 instead of 9.1, or,
Install PyTorch that is compiled with CUDA 9.1, if there isn’t one yet, you may need download PyTorch source code and compile by yourself.

BTW, I myself failed install PyTorch with Anaconda Python 3.7, but succeeded with 3.6. (on Ubuntu 18.04)

Hope this helps.

vdw · January 20, 2019, 6:37am

As I know now, PyTorch comes with all the required CUDA, cuDNN, etc. code bundled in the binaries – I don’t need anything special that would require to compile the sources myself :). Hence, there’s no need to install the CUDA Toolkit at all, and in fact I removed it completely. Only the Nvidia drivers and PyTorch are needed.

I got it working at the moment, simply switching to a single-card setup, i.e., I removed the “small” NVS 310 – initially, I wanted to keep that card to drive all graphical output and use the 1080 solely for number crunching. However, no combination of drivers or PyTorch version (incl. different CUDA version) worked. With only the 1080 it was smooth sailing and worked immediately. The main difference is, is that I can now a newer Nvidia driver, 415 instead of 390, the limit of the NVS 310.

Your comment that you had to downgrade to Python 3.6 sounds very interesting, though. I actually also tried what happens when using only the NVS 310. I knew that its compute capability was to low, but when I tested it a couple of months ago, I got the respective error messages (“Your card is too old” or something like this). But now, it could even find the Nvidia driver. PyTorch installs quite alright with Anaconda + Python 3.7, it just won’t run in CUDA mode. I will probably give a clean Anaconda + Python 3.6 a shot, just to see if it makes a difference for me. Otherwise, I will leave the NVS 310 out of the machine. I don’t even know, if there’s a serious advantage if the 1080 wouldn’t need to handle the graphics output.

Thanks a lot for your feedback!

airzs · March 7, 2019, 8:13am

I faced the same problem with you, and Ithink this is because system cannot find the driver of nvidia. Adding the path of nvidia into my system path may helps.

vdw · March 12, 2019, 2:51am

Thanks, but I probably will stick with my working solution, i.e., only using the 1080 without the NVS 310 in parallel. With this setup, everything went through without any issues.

chan0park · March 23, 2019, 11:05pm

In my case (on a cluster with gpus), adding CUDA_VISIBLE_DEVICES=GPU_ID before the python command solved the problem. Instead of adding CUDA_VISIBLE_DEVICES in the command line, you can probably just add a line os.environ[“CUDA_VISIBLE_DEVICES”] =“GPU_ID” before import torch.

SayedMaheen · July 6, 2019, 11:48am

same problem here: Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from

Kirk · August 5, 2019, 7:21am

This works for me too

AlbertZeyer · August 9, 2023, 6:09pm

Also check the newer posts:

In my case, it did not found libcuda.so, although nvidia-smi worked fine. I just missed /usr/lib/x86_64-linux-gnu in my LD_LIBRARY_PATH. Adding that, then all worked fine.