PyTorch cannot find cuda devices at 22.04 with 3090Ti GPUs

Ce_Zhang · December 27, 2022, 5:38pm

Hello PyTorch,
I am trying to build neural network models on the 3090Ti GPUs but PyTorch cannot find CUDA devices. My computer system is Ubuntu 22.04.1 LTS and the Ubuntu kernel version is 5.15.0-56.generic The outputs of PyTorch testing environment are

Collecting environment information...
PyTorch version: 1.13.1
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: 11.7.99
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 3090 Ti
GPU 1: NVIDIA GeForce RTX 3090 Ti

Nvidia driver version: 525.60.11
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.4
[pip3] torch==1.13.1
[pip3] torchaudio==0.13.1
[pip3] torchvision==0.14.1
[conda] blas                      1.0                         mkl  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0           py310h7f8727e_0  
[conda] mkl_fft                   1.3.1           py310hd6ae3a3_0  
[conda] mkl_random                1.2.2           py310h00e6091_0  
[conda] numpy                     1.23.4          py310hd5efca6_0  
[conda] numpy-base                1.23.4          py310h8e6c178_0  
[conda] pytorch                   1.13.1          py3.10_cuda11.7_cudnn8.5.0_0    pytorch
[conda] pytorch-cuda              11.7                 h67b0de4_1    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.13.1              py310_cu117    pytorch
[conda] torchvision               0.14.1              py310_cu117    pytorch

I have tested from PyTorch 1.6 to 1.13 with different cuda versions. I have also tested on Python 3.7, 3.8, 3.9, and 3.10 but the results are the same.

Here is the outputs for nvidia driver and cudatoolkit version. Thanks a lot!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  Off |
|  0%   45C    P8     9W / 450W |    432MiB / 24564MiB |     10%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  Off |
|  0%   34C    P8    13W / 450W |      6MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2305      G   /usr/lib/xorg/Xorg                195MiB |
|    0   N/A  N/A      2442      G   /usr/bin/gnome-shell               64MiB |
|    0   N/A  N/A      3108    C+G   ...753587327527835605,131072      170MiB |
|    1   N/A  N/A      2305      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

I have tested on 11.7., 11.6, and 12.0 cudatoolkit

ptrblck · December 27, 2022, 10:14pm

Was this workstation working before? If so, what did you change that could have introduced this issue? E.g. did you update any drivers without rebooting the system?
In case this setup is a new one, which wasn’t working previously, I would recommend to check any CUDA sample and make sure it’s working first. If not, check if your drivers are properly installed and reinstall them if in doubt.

Ce_Zhang · December 28, 2022, 3:03pm

Hello @ptrblck, Thanks for your reply. This workstation is working before when installing dual 3090 GPUs. Right now, it is installed with 3090ti GPUs. I have updated the driver and reboot the ubuntu but it still shows the same results

J_Johnson · December 28, 2022, 3:54pm

Your nvidia-smi shows you have CUDA 12.0 installed. I don’t see that as an option for Pytorch on the install page. Maybe you can downgrade to a lower version of CUDA, such as 11.7. That will then match the version of Pytorch installed.

ptrblck · December 28, 2022, 7:28pm

The PyTorch binaries ship with their own CUDA runtime and as long as your NVIDIA driver is new enough, PyTorch will run.
Your locally installed CUDA toolkit (12.0 in this case) will only be used if you are building PyTorch from source or a custom CUDA extension.

This sounds like a driver/setup issue then as you would often run into such issues when trying to “hot swap” GPUs.

Ce_Zhang · December 29, 2022, 1:24am

That could be the issue. However, I have reinstalled the whole ubuntu after swapping the GPUs. The only difference is I upgrade from Ubuntu 20.04 to 22.04 due to some hardware issues (It’s because my new motherboard support wifi6 but ubuntu 20.04 doesn’t support). So, I’m wondering is PyTorch only support 20.04 instead of 22.04?

ptrblck · December 29, 2022, 2:33am

No, I’m not aware of any limitations using Ubuntu 22.04, but am not using it myself.
However, did you try to run any other CUDA sample as I’ve suggested before?
I don’t think it’s useful to focus on PyTorch is your general setup is not working at all, so could you verify it first?

ConvolutionalAtom · December 29, 2022, 3:27am

I upgraded from Ubuntu 20.04 to 22.04 without any problem about Pytorch for internal gpus on two different laptops (RTX A2000 and 3080 Ti)

Ce_Zhang · January 20, 2023, 4:10pm

The problem is solved by reinstalling the whole ubuntu 2204 and reinstall the newest nvidia driver. Thank you so much for all of your suggestions and recommendations.