UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero

This issue has suddenly arisen whenever I run torch.cuda.is_available.

UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /opt/conda/conda-bld/pytorch_1603729009598/work/c10/cuda/CUDAFunctions.cpp:100.)`

Output of collect_env.py

Collecting environment information…
PyTorch version: 1.7.0
Is debug build: True
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.11.0-25-generic-x86_64-with-glibc2.10
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 450.119.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] geotorch==0.2.0
[pip3] numpy==1.19.2
[pip3] numpydoc==1.1.0
[pip3] torch==1.7.0
[pip3] torch-cluster==1.5.8
[pip3] torch-geometric==1.6.3
[pip3] torch-geometric-temporal==0.0.11
[pip3] torch-scatter==2.0.5
[pip3] torch-sparse==0.6.8
[pip3] torch-spline-conv==1.2.0
[pip3] torchaudio==0.7.0a0+ac17b64
[pip3] torchcontrib==0.0.2
[pip3] torchdiffeq==0.2.1
[pip3] torchvision==0.8.1
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] geotorch 0.2.0 pypi_0 pypi
[conda] mkl 2020.4 h726a3e6_304 conda-forge
[conda] mkl-service 2.3.0 py38h1e0a361_2 conda-forge
[conda] mkl_fft 1.3.0 py38h5c078b8_1 conda-forge
[conda] mkl_random 1.2.0 py38hc5bc63f_1 conda-forge
[conda] numpy 1.19.2 py38h54aff64_0
[conda] numpy-base 1.19.2 py38hfa32c7d_0
[conda] numpydoc 1.1.0 py_1 conda-forge
[conda] pytorch 1.7.0 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torch-cluster 1.5.8 pypi_0 pypi
[conda] torch-geometric 1.6.3 pypi_0 pypi
[conda] torch-geometric-temporal 0.0.11 pypi_0 pypi
[conda] torch-scatter 2.0.5 pypi_0 pypi
[conda] torch-sparse 0.6.8 pypi_0 pypi
[conda] torch-spline-conv 1.2.0 pypi_0 pypi
[conda] torchaudio 0.7.0 py38 pytorch
[conda] torchdiffeq 0.2.1 pypi_0 pypi
[conda] torchvision 0.8.1 py38_cu101 pytorch

Output of nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Finally, output of nvidia-smi

±----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03 Driver Version: 450.119.03 CUDA Version: 11.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108… Off | 00000000:04:00.0 On | N/A |
| 14% 51C P5 12W / 250W | 255MiB / 11177MiB | 1% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 933 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 1506 G /usr/lib/xorg/Xorg 78MiB |
| 0 N/A N/A 1632 G /usr/bin/gnome-shell 126MiB |
| 0 N/A N/A 3021 G /usr/lib/firefox/firefox 2MiB |
±----------------------------------------------------------------------------+

Any help would be appreciated.

1 Like

This error is raised e.g. if your system cannot communicate with the GPU, which might be caused e.g. by a driver update without a restart or any other setup issue.
On my personal workstation I see this issue after waking the system from its “suspend” status, as this still does seem to cause such issues (after restarting it, it works again).

3 Likes

Thanks for the reply. Unfortunately, restarting my machine doesn’t resolve the issue.

1 Like

Hello, this issue also happened when I wake the Ubuntu 22.04 and run torch.cuda.is_available().
If I reboot it, it will work again. How can I fix it without rebooting the system?
My GPU is RTX3090 with the newest driver 515.43.
Thank you!

1 Like

You could try to execute:

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

which helps on my Ubuntu system after it was suspended.

9 Likes

Thank you for your reply!
I tried the two commands but they did not work.
If I run torch.available, CUDA will report the same problem. Maybe it is a bug about power management of NVIDIA Driver?

Yeah, I think it’s a known issue in the interaction of the “Suspend” mode and the driver.
When I have IDEs open, I get sometimes the error: rmmod: ERROR: Module nvidia_uvm is in use and cannot reset the GPU(s). In that case I have to reboot unfortunately, but ~9/10 times these two commands do the job and I can properly use the GPU again.

1 Like

I got error after millions of trying these 2 command and still torch.cuda_is_avaliable returns cpu :confused:

Anyone still who still has this issue try:

sudo apt-get install nvidia-modprobe

worked for me!
Source: RuntimeError: CUDA unknown error · Issue #49081 · pytorch/pytorch · GitHub

2 Likes

seemed to work for me too!