I have successfully installed NVIDIA driver & cudatoolkit via conda. However, I am not able to use cuda in pytorch (even though it installed successfully).
Previously, I was using Pytorch with CUDA 8.0, and wanted to upgrade. I removed / purge all CUDA through:
sudo apt-get --purge remove cuda
sudo apt-get autoremove
dpkg --list |grep "^rc" | cut -d " " -f 3 | xargs sudo dpkg --purge
Then I updated my Nvidia drivers to 4.10 via PPA (Ubuntu 16.04):
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-410
Everything worked smoothly. The output of nvidia-smi
:
Fri Aug 23 22:29:48 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:01:00.0 On | N/A |
| 25% 35C P8 13W / 250W | 531MiB / 11177MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1445 G /usr/lib/xorg/Xorg 317MiB |
| 0 2035 G compiz 101MiB |
| 0 3572 G ...uest-channel-token=13099850080781834209 110MiB |
+-----------------------------------------------------------------------------+
The output of cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 410.78 Sat Nov 10 22:09:04 CST 2018
GCC version: gcc version 4.9.4 (Ubuntu 4.9.4-2ubuntu1~16.04)
Since I wanted conda to manage my CUDA version, I installed the cudatoolkit through conda env (python 3.6):
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
again, everything installs perfectly. When I run:
print(torch.cuda.device_count()) # --> 0
print(torch.version.cuda) # --> 10.0.130
but using cuda fails. I get the following error message
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rana/anaconda3/envs/py36torch12cu10/lib/python3.6/site-packages/torch/cuda/__init__.py", line 178, in _lazy_init
_check_driver()
File "/home/rana/anaconda3/envs/py36torch12cu10/lib/python3.6/site-packages/torch/cuda/__init__.py", line 99, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
I restarted, removed all irrelevant environment variables which may have caused issues (LD_LIBRARY_PATH), removed conda, reinstalled, tried cuda 9.2, but nothing works. I am not sure what the issue could be. Any ideas?
I searched a bit, and found this pytorch thread. Since I completely removed CUDA from my system this shouldn’t be the problem, but I think somehow it may be related.
EDIT:
It isn’t surprising given my error, but following this issue, I checked:
torch._C._cuda_getDriverVersion() # -> 0