The right way to use CUDA in PyTorch on Linux: In venv, Not in conda

shenxiangzhuang · March 16, 2024, 1:54pm

Hi, I have some questions about using CUDA on Linux which make me very confusing.

In short, I can use CUDA with conda env, but not in python venv…I spend a lot of time try to make CUDA work in venv, but I failed, I keep got False from python -c 'import torch; print(torch.cuda.is_available()).

I just want to use CUDA in the venv like in conda env(return True from python -c 'import torch; print(torch.cuda.is_available())). Any suggestions would be greatly appreciated!

The details are as fellows:

Hardware

GPU: NVIDIA GeForce RTX 4060 Ti

Software

OS: Ubuntu 20.04
Anaconda: 23.3.1
Python: 3.11
I install torch in both conda and venv by fellowing the Pytorch doc: https://pytorch.org/get-started/locally/
nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off |   00000000:2B:00.0  On |                  N/A |
|  0%   44C    P5             22W /  165W |     912MiB /  16380MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

shenxiangzhuang · March 16, 2024, 2:04pm

More observations in venv:

The wired thing is that the last error message: RuntimeError: Found no NVIDIA driver on your system.

Because the CUDA works well in conda env, the NVIDIA driver should already installed and works well. So why the error here?

By checking resources:

PyTorch binaries ship with their own CUDA runtime (as well as other CUDA libs such as cuBLAS, cuDNN, NCCL, etc.). The locally installed CUDA toolkit (12.0 in your case) will only be used if you are building PyTorch from source or a custom CUDA extension.

So the system CUDA toolkit doesn’t matter, which means the problem is still in the NVIDIA Driver, right?
In this way, the question may should be transformed to: why the torch installed in venv can not find the NVIDIA Driver while the torch in conda env can? And how to fix the NVIDIA Driver error?

Some test outputs

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: False

In [3]: torch.cuda.get_arch_list()
Out[3]: []

In [4]: torch.__path__
Out[4]: ['/home/mathewshen/test/cuda-torch/.venv/lib/python3.11/site-packages/torch']

In [5]: torch.__version__
Out[5]: '2.2.1+cu121'

In [6]: torch.Tensor([1]).to("cuda:0")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 1
----> 1 torch.Tensor([1]).to("cuda:0")

File ~/test/cuda-torch/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:302, in _lazy_init()
    300 if "CUDA_MODULE_LOADING" not in os.environ:
    301     os.environ["CUDA_MODULE_LOADING"] = "LAZY"
--> 302 torch._C._cuda_init()
    303 # Some of the queued calls may reentrantly call _lazy_init();
    304 # we need to just return without initializing in that case.
    305 # However, we must not let any *other* threads in!
    306 _tls.is_initializing = True

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

The pip list

Package                  Version
------------------------ ----------
asttokens                2.4.1
decorator                5.1.1
executing                2.0.1
filelock                 3.13.1
fsspec                   2024.3.0
ipython                  8.22.2
jedi                     0.19.1
jinja2                   3.1.3
markupsafe               2.1.5
matplotlib-inline        0.1.6
mpmath                   1.3.0
networkx                 3.2.1
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.19.3
nvidia-nvjitlink-cu12    12.4.99
nvidia-nvtx-cu12         12.1.105
parso                    0.8.3
pexpect                  4.9.0
prompt-toolkit           3.0.43
ptyprocess               0.7.0
pure-eval                0.2.2
pygments                 2.17.2
six                      1.16.0
stack-data               0.6.3
sympy                    1.12
torch                    2.2.1
traitlets                5.14.2
triton                   2.2.0
typing-extensions        4.10.0
wcwidth                  0.2.13

JuanFMontesinos · March 20, 2024, 10:03am

I cannot imagine why wouldn’t this work.
Be sure that you are installing pytorch with cuda and with a cuda that is supported by the drivers.
Then set your LD_LIBRARY_PATH to empty. I believe pytorch can fallback to some other installed cuda in your system. I’ve sometimes had issues with this sort of conflict.

shenxiangzhuang · March 21, 2024, 2:01am

Hi, @JuanFMontesinos，thanks for your reply!

I figure it out in recent, which is cause by an very inconspicuous question: he python install by linux homebrew is used to create the venv has some problem in it, when I reinstall the python with apt, problem solved.

In summary, there is no problem in torch or cuda, the problem is the python interpreter which is used to create the venv.

(PS: I still don’t know why the python installed by linux homebrew will have this problem. Maybe due to the some compatibility problems)