PyTorch cannot found CUDA version 11.0 for NVIDIA GeForce RTX 2070 SUPER

csesaswati · April 4, 2024, 6:53pm

Pytorch cannot find the CUDA in my GPU enabled machine. The command !python -m torch.utils.collect_env returns the following information,

Collecting environment information...
/home/user/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370120218/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 11.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.6 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce RTX 2070 SUPER
Nvidia driver version: 450.119.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.1
[pip3] torchaudio==0.7.0a0+a853dff
[pip3] torchvision==0.8.2
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.0.221             h6bb024c_0  
[conda] mkl                       2020.2                      256  
[conda] mkl-service               2.3.0            py36he8ac12f_0  
[conda] mkl_fft                   1.3.0            py36h54f3939_0  
[conda] mkl_random                1.1.1            py36h0573a6f_0  
[conda] numpy                     1.19.2           py36h54aff64_0  
[conda] numpy-base                1.19.2           py36hfa32c7d_0  
[conda] pytorch                   1.7.1           py3.6_cuda11.0.221_cudnn8.0.5_0    pytorch
[conda] torchaudio                0.7.2                      py36    pytorch
[conda] torchvision               0.8.2                py36_cu110    pytorch

torch.cuda.is_available() returned False and print(torch.backends.cudnn.enabled) returned True

I can found the following specification using nvidia-smi commands

The GPU version of my machine is NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER]

How can I solve this problem?

ptrblck · April 4, 2024, 6:57pm

It seems you are running into a version mismatch, so maybe try to create an empty and clean virtual environment and install the latest PyTorch release there. If PyTorch still has issues communicating with your GPU you might need to update the NVIDIA driver. CUDA 12.x requires >=525.60.13 on Linux.

csesaswati · April 4, 2024, 7:58pm

In the tf-gpu(user created environment name) environment I ran
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
command to install PyTorch.

Also latest PyTorch requires Python 3.8 or later. CUDA 11.8. My CUDA version is not compatible with the latest requirement of PyTorch so I installed archived version.

If I do not update the driver what will be the PyTorch version I need to choose?

ptrblck · April 4, 2024, 10:54pm

Your locally installed CUDA toolkit won’t be used as the PyTorch binaries ship with their own CUDA runtime. You would need to properly install a supported NVIDIA driver to execute PyTorch workloads on the GPU.

Was this setup working before at one point or is this a new workstation?

csesaswati · April 5, 2024, 12:10am

This is the 1st time I used PyTorch with CUDA setup. I found PyTorch 1.7 with CUDA 11.0 can be installed since a condo command version available on the PyTorch archive, so I tried it.

I am wondering if PyTorch 1.7 with CUDA 11.0 are compatible then why it is not work for me?

ptrblck · April 5, 2024, 8:44pm

I don’t know why it’s not working for you.
Using 470.82.01 as the driver and installing PyTorch via conda install pytorch==1.7.0 cudatoolkit=11.0 -c pytorch into a new and empty environment works for me:

>>> import torch
>>> torch.__version__
'1.7.0'
>>> torch.version.cuda
'11.0'
>>> torch.randn(1).cuda()
tensor([0.8996], device='cuda:0')

csesaswati · April 9, 2024, 3:36pm

I downgrade the CUDA version using the command

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch

But still unable to find CUDA.

!python -m torch.utils.collect_env
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.6 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce RTX 2070 SUPER
Nvidia driver version: 450.119.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.1
[pip3] torchaudio==0.7.0a0+a853dff
[pip3] torchvision==0.8.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py36he8ac12f_0
[conda] mkl_fft 1.3.0 py36h54f3939_0
[conda] mkl_random 1.1.1 py36h0573a6f_0
[conda] numpy 1.19.2 py36h54aff64_0
[conda] numpy-base 1.19.2 py36hfa32c7d_0
[conda] pytorch 1.7.1 py3.6_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torchaudio 0.7.2 py36 pytorch
[conda] torchvision 0.8.2 py36_cu101 pytorch

Is there any special requirement while creating the environment on conda?

What would be the python version required with torch 1.7.1 and CUDA 10.1?

csesaswati · April 9, 2024, 4:58pm

I created a clean environment using the command

conda create --name experiment1

Activate the environment

conda activate experiment1

install pytorch using the command

conda install pytorch==1.7.0 cudatoolkit=11.0 -c pytorch

Active python3 using

python3

Then used

>>> import torch
>>> torch.__version__

Which returned
2.2.1+cu121’

>>> torch.version.cuda

returned
‘12.1’

ptrblck · April 10, 2024, 8:45pm

Cross-post from here.