The right way to use CUDA in PyTorch on Linux: In venv, Not in conda

shenxiangzhuang · March 16, 2024, 2:04pm

More observations in venv:

The wired thing is that the last error message: RuntimeError: Found no NVIDIA driver on your system.

Because the CUDA works well in conda env, the NVIDIA driver should already installed and works well. So why the error here?

By checking resources:

PyTorch binaries ship with their own CUDA runtime (as well as other CUDA libs such as cuBLAS, cuDNN, NCCL, etc.). The locally installed CUDA toolkit (12.0 in your case) will only be used if you are building PyTorch from source or a custom CUDA extension.

So the system CUDA toolkit doesn’t matter, which means the problem is still in the NVIDIA Driver, right?
In this way, the question may should be transformed to: why the torch installed in venv can not find the NVIDIA Driver while the torch in conda env can? And how to fix the NVIDIA Driver error?

Some test outputs

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: False

In [3]: torch.cuda.get_arch_list()
Out[3]: []

In [4]: torch.__path__
Out[4]: ['/home/mathewshen/test/cuda-torch/.venv/lib/python3.11/site-packages/torch']

In [5]: torch.__version__
Out[5]: '2.2.1+cu121'

In [6]: torch.Tensor([1]).to("cuda:0")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 1
----> 1 torch.Tensor([1]).to("cuda:0")

File ~/test/cuda-torch/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:302, in _lazy_init()
    300 if "CUDA_MODULE_LOADING" not in os.environ:
    301     os.environ["CUDA_MODULE_LOADING"] = "LAZY"
--> 302 torch._C._cuda_init()
    303 # Some of the queued calls may reentrantly call _lazy_init();
    304 # we need to just return without initializing in that case.
    305 # However, we must not let any *other* threads in!
    306 _tls.is_initializing = True

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

The pip list

Package                  Version
------------------------ ----------
asttokens                2.4.1
decorator                5.1.1
executing                2.0.1
filelock                 3.13.1
fsspec                   2024.3.0
ipython                  8.22.2
jedi                     0.19.1
jinja2                   3.1.3
markupsafe               2.1.5
matplotlib-inline        0.1.6
mpmath                   1.3.0
networkx                 3.2.1
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.19.3
nvidia-nvjitlink-cu12    12.4.99
nvidia-nvtx-cu12         12.1.105
parso                    0.8.3
pexpect                  4.9.0
prompt-toolkit           3.0.43
ptyprocess               0.7.0
pure-eval                0.2.2
pygments                 2.17.2
six                      1.16.0
stack-data               0.6.3
sympy                    1.12
torch                    2.2.1
traitlets                5.14.2
triton                   2.2.0
typing-extensions        4.10.0
wcwidth                  0.2.13