PyTorch 1.8 cuda cannot use GPU

$ nvidia-smi
Fri Mar 5 22:28:25 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 165… Off | 00000000:01:00.0 Off | N/A |
| N/A 37C P0 6W / N/A | 10MiB / 3911MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1224 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1788 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

$ python
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux

torch.version
‘1.8.0+cu111’

I have installed PyTorch this way:

$ pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.8.0+cu111
Using cached https://download.pytorch.org/whl/cu111/torch-1.8.0%2Bcu111-cp38-cp38-linux_x86_64.whl (1982.2 MB)
Collecting torchvision==0.9.0+cu111
Using cached https://download.pytorch.org/whl/cu111/torchvision-0.9.0%2Bcu111-cp38-cp38-linux_x86_64.whl (17.6 MB)
Requirement already satisfied: torchaudio==0.8.0 in /home/mona/venv/fall/lib/python3.8/site-packages (0.8.0)
Requirement already satisfied: numpy in /home/mona/venv/fall/lib/python3.8/site-packages (from torch==1.8.0+cu111) (1.20.1)
Requirement already satisfied: typing-extensions in /home/mona/venv/fall/lib/python3.8/site-packages (from torch==1.8.0+cu111) (3.7.4.3)
Requirement already satisfied: pillow>=4.1.1 in /home/mona/venv/fall/lib/python3.8/site-packages (from torchvision==0.9.0+cu111) (8.1.1)
Installing collected packages: torch, torchvision
Successfully installed torch-1.8.0+cu111 torchvision-0.9.0+cu111

You are still or again facing the initialization error from your previous post, so you might have updated the drivers etc. again without a reboot.

2 Likes

Thanks a lot. I did restart again and still cannot access GPU. Do you know how I can fix it?

(fall) mona@goku:~$ python
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
torch.cuda.is_available()
False
torch.version
‘1.8.0+cu111’

By the way, it’s weird after I did the restart, now the NVIDIA driver is not recognized


$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I followed these exact instructions and now it works:

sudo apt purge '^.*nvidia*'
sudo apt install ubuntu-desktop
sudo apt purge "^.*cublas*"
sudo apt purge "^.*cuda*"
sudo mv /etc/X11/xorg.conf /etc/X11/xorg.conf.old
sudo apt autoremove
sudo reboot now

While rebooting, disable Secure Boot and let then Ubuntu load completely.

sudo ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo reboot now
$ pip uninstall torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 
$ pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

and now:

(fall) mona@goku:~$ python
Python 3.8.5 (default, Jan 27 2021, 15:41:15) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True

also:

$ nvidia-smi
Mon Mar  8 13:29:44 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 165...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   42C    P8     3W /  N/A |     10MiB /  3911MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1207      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      1784      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

resources:
[1] Comment #12 : Bug #1871041 : Bugs : nvidia-graphics-drivers-440 package : Ubuntu
[2] Start Locally | PyTorch

1 Like

@ptrblck
so I got the following error (I guess my system had an auto update), and after following all the steps I have provided in the answers above, I still get the same exact error. Do you have any tips how I could fix this?

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

a bit more details:
mona@goku:~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 460.39 Thu Jan 21 21:54:06 UTC 2021
GCC version:

mona@goku:~$ lsmod | grep ^nvidia
nvidia_uvm 1019904 0
nvidia_drm 53248 4
nvidia_modeset 1228800 5 nvidia_drm
nvidia 34095104 157 nvidia_uvm,nvidia_modeset

I got to get the driver working but now the cuda is not recognized :smiley:


mona@goku:~$  pip uninstall torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 
Found existing installation: torch 1.8.0+cu111
Uninstalling torch-1.8.0+cu111:
  Would remove:
    /home/mona/venv/phosa/bin/convert-caffe2-to-onnx
    /home/mona/venv/phosa/bin/convert-onnx-to-caffe2
    /home/mona/venv/phosa/lib/python3.8/site-packages/caffe2/*
    /home/mona/venv/phosa/lib/python3.8/site-packages/torch-1.8.0+cu111.dist-info/*
    /home/mona/venv/phosa/lib/python3.8/site-packages/torch/*
Proceed (y/n)? y
  Successfully uninstalled torch-1.8.0+cu111
Found existing installation: torchvision 0.9.0+cu111
Uninstalling torchvision-0.9.0+cu111:
  Would remove:
    /home/mona/venv/phosa/lib/python3.8/site-packages/torchvision-0.9.0+cu111.dist-info/*
    /home/mona/venv/phosa/lib/python3.8/site-packages/torchvision.libs/libcudart.05b13ab8.so.11.0
    /home/mona/venv/phosa/lib/python3.8/site-packages/torchvision.libs/libjpeg.ceea7512.so.62
    /home/mona/venv/phosa/lib/python3.8/site-packages/torchvision.libs/libpng16.7f72a3c5.so.16
    /home/mona/venv/phosa/lib/python3.8/site-packages/torchvision.libs/libz.1328edc3.so.1
    /home/mona/venv/phosa/lib/python3.8/site-packages/torchvision/*
Proceed (y/n)? y
  Successfully uninstalled torchvision-0.9.0+cu111
Found existing installation: torchaudio 0.8.0
Uninstalling torchaudio-0.8.0:
  Would remove:
    /home/mona/venv/phosa/lib/python3.8/site-packages/torchaudio-0.8.0.dist-info/*
    /home/mona/venv/phosa/lib/python3.8/site-packages/torchaudio/*
Proceed (y/n)? y
  Successfully uninstalled torchaudio-0.8.0
(phosa) mona@goku:~/research/code/phosa$ nvidia-smi
Thu Mar 25 23:15:15 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 165...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P0     7W /  N/A |     10MiB /  3911MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1239      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      1802      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+
(phosa) mona@goku:~/research/code/phosa$ pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.8.1+cu111
  Downloading https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp38-cp38-linux_x86_64.whl (1982.2 MB)
     |████████████████████████████████| 1982.2 MB 21 kB/s 
Collecting torchvision==0.9.1+cu111
  Downloading https://download.pytorch.org/whl/cu111/torchvision-0.9.1%2Bcu111-cp38-cp38-linux_x86_64.whl (17.6 MB)
     |████████████████████████████████| 17.6 MB 11.3 MB/s 
Collecting torchaudio==0.8.1
  Downloading torchaudio-0.8.1-cp38-cp38-manylinux1_x86_64.whl (1.9 MB)
     |████████████████████████████████| 1.9 MB 5.1 MB/s 
Requirement already satisfied: typing-extensions in /home/mona/venv/phosa/lib/python3.8/site-packages (from torch==1.8.1+cu111) (3.7.4.3)
Requirement already satisfied: numpy in /home/mona/venv/phosa/lib/python3.8/site-packages (from torch==1.8.1+cu111) (1.19.5)
Requirement already satisfied: pillow>=4.1.1 in /home/mona/venv/phosa/lib/python3.8/site-packages (from torchvision==0.9.1+cu111) (8.1.0)
Installing collected packages: torch, torchvision, torchaudio
Successfully installed torch-1.8.1+cu111 torchaudio-0.8.1 torchvision-0.9.1+cu111
mona@goku:~/research/code/phosa$ python
Python 3.8.5 (default, Jan 27 2021, 15:41:15) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/home/mona/venv/phosa/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False
>>>

then I reboot

and now it is working

how I installed the new driver: https://askubuntu.com/a/1326173/165324

Since your systems seems to update drivers behind your back quite often (which doesn’t seem to be wanted), you could disable these automatic updates and manually update the drivers when needed.

1 Like

I’m using windows 11.
i used your link.
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
and it fixed my issue,

Thanks,

1 Like