Getting Pytorch to work with the right CUDA version

rezzeh · April 6, 2022, 10:28am

It all started when I wanted to work with Fastai library which at some point led me to install Pytorch first. Anyway, I always get False when calling torch.cuda.is_available() and None when calling torch.version.cuda

This is on Ubuntu 18.04

nvidia-smi outputs


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:65:00.0 Off |                  N/A |
| 41%   22C    P8    16W / 260W |     18MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2129      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      2393      G   /usr/bin/gnome-shell                6MiB |
+-----------------------------------------------------------------------------+

By running collect_env


Collecting environment information...
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.27

Python version: 3.9.12 (main, Apr  5 2022, 06:56:58)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.15.0-175-generic-x86_64-with-glibc2.27
Is CUDA available: False
CUDA runtime version: 9.1.85
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2080 Ti
Nvidia driver version: 510.47.03
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: torch.backends.xnnpack.enabled

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.1
[pip3] torchaudio==0.7.0a0+a853dff
[pip3] torchvision==0.8.0a0
[conda] _pytorch_select           0.1                       cpu_0  
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               9.0                  h13b8566_0  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] libmklml                  2019.0.5             h06a4308_0  
[conda] mkl                       2020.2                      256  
[conda] mkl-service               2.3.0            py39he8ac12f_0  
[conda] mkl_fft                   1.3.0            py39h54f3939_0  
[conda] mkl_random                1.0.2            py39h63df603_0  
[conda] numpy                     1.19.2           py39h89c1606_0  
[conda] numpy-base                1.19.2           py39h2ae0177_0  
[conda] pytorch                   1.7.1           cpu_py39h6a09485_0  
[conda] pytorch-mutex             1.0                         cpu    pytorch
[conda] torchaudio                0.7.2                      py39    fastchan
[conda] torchvision               0.8.2           cpu_py39ha229d99_0

and nvcc --version outputs

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

I tried many versions and many installations, since I got CUDA 9.01 I chose the versions appears here but also did not work.

However, when I tried to look for CUDA location it was in the directory /usr/local/cuda-10.1/

Do you think that the CUDA installation is messed up? This is a server on my university that I do research on. I think some students messed up things. I need some advice on what is best to do to fix CUDA and run Pytorch successfully.

ptrblck · April 6, 2022, 8:16pm

Assuming you’ve installed the pip wheels or conda binaries, you might have installed the CPU-only binaries which do not ship with the CUDA runtime and libs.

Your local CUDA toolkits won’t be used unless you build PyTorch from source or a custom CUDA extension.
To run the binaries with a CUDA runtime your system would only need to install a valid NVIDIA driver.

rezzeh · April 14, 2022, 7:58pm

Thank you for your answer!
I actually fixed the issue by setting CUDA_HOME path and adding it to PATH and then installing cudatoolkit=10.1 in the conda environment