No more cuda available after installing last nvidia drivers

ihaddy · June 5, 2024, 6:27pm

Hey just wanted to chime in here (Patrick I think you’re doing a great job communicating):

I’m also wondering if it’s an Nvidia driver update problem

ptrblck:

conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch-nightly -c nvidia
...
## Package Plan ##

...

  added / updated specs:
    - pytorch
    - pytorch-cuda=12.4
    - torchaudio
    - torchvision
...
The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
...
    cuda-cudart-12.4.127       |                0         198 KB  nvidia
    cuda-cupti-12.4.127        |                0        16.4 MB  nvidia
    cuda-libraries-12.4.0      |                0           2 KB  nvidia
    cuda-nvrtc-12.4.127        |                0        21.0 MB  nvidia
    cuda-nvtx-12.4.127         |                0          58 KB  nvidia
    cuda-opencl-12.4.127       |                0          12 KB  nvidia
    cuda-runtime-12.4.0        |                0           2 KB  nvidia
...
    pytorch-2.4.0.dev20240527  |py3.10_cuda12.4_cudnn8.9.2_0        1.38 GB  pytorch-nightly
    pytorch-cuda-12.4          |       hc786d27_6           7 KB  pytorch-nightly
    pyyaml-6.0.1               |  py310h2372a71_1         167 KB  conda-forge
    requests-2.32.2            |     pyhd8ed1ab_0          57 KB  conda-forge
    sympy-1.12                 | pypyh9d50eac_103         4.1 MB  conda-forge
    tbb-2021.12.0              |       h297d8ca_1         190 KB  conda-forge
    torchaudio-2.2.0.dev20240527|      py310_cu124         6.2 MB  pytorch-nightly
    torchtriton-3.0.0+45fff310c8|            py310       250.5 MB  pytorch-nightly
    torchvision-0.19.0.dev20240527|      py310_cu124         8.3 MB  pytorch-nightly
...
python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.randn(1).cuda())"
2.4.0.dev20240527                                                                                                                                                                                                  
12.4                                                                                                                                                                                                               
tensor([1.2858], device='cuda:0')

I have run the same commands on my windows machine and i get


conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch-nightly -c nvidia

(ifwhisper-cuda) PS C:\Users\123> python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.randn(1).cuda())"
2.4.0.dev20240605
None
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\123\miniconda3\envs\ifwhisper-cuda\lib\site-packages\torch\cuda\__init__.py", line 301, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

just updated my drivers today too which i think was a mistake lol. Nvidia SMI shows:

NVIDIA-SMI 555.99 Driver Version: 555.99 CUDA Version: 12.5

as an update, tried it on a linux machine with an RTX 3080 and NVIDIA 550 i get the same results:

(ifwhisper-cuda) isaac@Isaac:~$ python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.randn(1).cuda())"                     
2.4.0.dev20240605                                                               
None                                                                            
Traceback (most recent call last):                                              
  File "<string>", line 1, in <module>                                          
  File "/home/isaac/miniconda3/envs/ifwhisper-cuda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 301, in _lazy_init                                 
    raise AssertionError("Torch not compiled with CUDA enabled")                
AssertionError: Torch not compiled with CUDA enabled

nvidia-smi output

| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |

so i’m actually not sure what the problem is related to

ptrblck · June 5, 2024, 7:04pm

Your output shows that CPU-only PyTorch binaries were installed, so the lack of GPU support is unrelated to the NVIDIA driver.

Same here as torch.version.cuda returns None while the GPU-enabled binaries would show the used CUDA runtime dependency.

ihaddy · June 7, 2024, 2:24pm

Thanks for the reply!

So this may be because im relatively new to working with pytorch, but were the commands you linked the GPU Pytorch or CPU Pytorch install commands? I copied your commands directly so I assumed they wouldbe GPU enabeld Pytorch. do i need to install that with separate commands?

Again, thanks for the help!

ptrblck · June 7, 2024, 2:51pm

The command installs the CUDA enabled conda binary as seen in the install matrix:

Kang_Xiaoning · June 13, 2024, 10:53am

Didn’t put too much time in this issue since reverting to 550 fixed everything, but there must be something incorrect, in pytorch/debian/nvidia side.

I used to use debian apt to install nvidia drivers as op does. After a brainless “apt update; apt upgrade” , both the nvidia driver and support libraries were updated to version 555. I rebooted the system to ensure everything is in effect.

Then all existing pytorch environments broken as people said in this thread.

(akicomfy) dl@kxn-deepl:~$ python
Python 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
print(torch.cuda.is_available())
/home/dl/miniconda3/envs/akicomfy/lib/python3.10/site-packages/torch/cuda/init.py:118: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at …/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False

I used conda to create another new environment , installed pytorch nightly as pytorch site suggested , the command was

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

The result was the same

Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.2.0 jinja2-3.1.3 mpmath-1.2.1 networkx-3.2.1 numpy-1.26.4 nvidia-cublas-cu12-12.4.2.65 nvidia-cuda-cupti-cu12-12.4.99 nvidia-cuda-nvrtc-cu12-12.4.99 nvidia-cuda-runtime-cu12-12.4.99 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.0.44 nvidia-curand-cu12-10.3.5.119 nvidia-cusolver-cu12-11.6.0.99 nvidia-cusparse-cu12-12.3.0.142 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.4.99 nvidia-nvtx-cu12-12.4.99 pillow-9.3.0 pytorch-triton-3.0.0+45fff310c8 sympy-1.12 torch-2.4.0.dev20240610+cu124 torchaudio-2.4.0.dev20240610+cu124 torchvision-0.19.0.dev20240610+cu124 typing-extensions-4.8.0
(test555) dl@kxn-deepl:~$ python
Python 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
print(torch.cuda.is_available())
/home/dl/miniconda3/envs/test555/lib/python3.10/site-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at …/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False

Reverting to everything related to nvidia driver to version 550 using aptitude fixed everything.

ptrblck · June 13, 2024, 12:33pm

I wouldn’t recommend to “brainlessly” update to a BETA Driver if you don’t want to spend time on it or if you are not looking to verify a specific fix or feature.

With that being said, it’s good to hear that 550 works for you!

StorywithLove · July 1, 2024, 8:39am

Oh, Thanks, It works for me.
At first, I installed the Nvidia driver for 12.4 and that install command didn’t work (nvidia-smi);
Later, I installed cuda-toolkit for 12.4 and set the environment variable (nvcc -V) and that command worked.