More observations in venv:
The wired thing is that the last error message: RuntimeError: Found no NVIDIA driver on your system
.
Because the CUDA works well in conda env, the NVIDIA driver should already installed and works well. So why the error here?
By checking resources:
PyTorch binaries ship with their own CUDA runtime (as well as other CUDA libs such as cuBLAS, cuDNN, NCCL, etc.). The locally installed CUDA toolkit (12.0 in your case) will only be used if you are building PyTorch from source or a custom CUDA extension.
So the system CUDA toolkit doesn’t matter, which means the problem is still in the NVIDIA Driver, right?
In this way, the question may should be transformed to: why the torch installed in venv can not find the NVIDIA Driver while the torch in conda env can? And how to fix the NVIDIA Driver error?
Some test outputs
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: False
In [3]: torch.cuda.get_arch_list()
Out[3]: []
In [4]: torch.__path__
Out[4]: ['/home/mathewshen/test/cuda-torch/.venv/lib/python3.11/site-packages/torch']
In [5]: torch.__version__
Out[5]: '2.2.1+cu121'
In [6]: torch.Tensor([1]).to("cuda:0")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 torch.Tensor([1]).to("cuda:0")
File ~/test/cuda-torch/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:302, in _lazy_init()
300 if "CUDA_MODULE_LOADING" not in os.environ:
301 os.environ["CUDA_MODULE_LOADING"] = "LAZY"
--> 302 torch._C._cuda_init()
303 # Some of the queued calls may reentrantly call _lazy_init();
304 # we need to just return without initializing in that case.
305 # However, we must not let any *other* threads in!
306 _tls.is_initializing = True
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
The pip list
Package Version
------------------------ ----------
asttokens 2.4.1
decorator 5.1.1
executing 2.0.1
filelock 3.13.1
fsspec 2024.3.0
ipython 8.22.2
jedi 0.19.1
jinja2 3.1.3
markupsafe 2.1.5
matplotlib-inline 0.1.6
mpmath 1.3.0
networkx 3.2.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.4.99
nvidia-nvtx-cu12 12.1.105
parso 0.8.3
pexpect 4.9.0
prompt-toolkit 3.0.43
ptyprocess 0.7.0
pure-eval 0.2.2
pygments 2.17.2
six 1.16.0
stack-data 0.6.3
sympy 1.12
torch 2.2.1
traitlets 5.14.2
triton 2.2.0
typing-extensions 4.10.0
wcwidth 0.2.13