Hi, I have some questions about using CUDA on Linux which make me very confusing.
In short, I can use CUDA with conda env, but not in python venv…I spend a lot of time try to make CUDA work in venv, but I failed, I keep got False
from python -c 'import torch; print(torch.cuda.is_available())
.
I just want to use CUDA in the venv like in conda env(return True
from python -c 'import torch; print(torch.cuda.is_available())
). Any suggestions would be greatly appreciated!
The details are as fellows:
Hardware
- GPU: NVIDIA GeForce RTX 4060 Ti
Software
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:2B:00.0 On | N/A |
| 0% 44C P5 22W / 165W | 912MiB / 16380MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
More observations in venv:
The wired thing is that the last error message: RuntimeError: Found no NVIDIA driver on your system
.
Because the CUDA works well in conda env, the NVIDIA driver should already installed and works well. So why the error here?
By checking resources:
PyTorch binaries ship with their own CUDA runtime (as well as other CUDA libs such as cuBLAS, cuDNN, NCCL, etc.). The locally installed CUDA toolkit (12.0 in your case) will only be used if you are building PyTorch from source or a custom CUDA extension.
So the system CUDA toolkit doesn’t matter, which means the problem is still in the NVIDIA Driver, right?
In this way, the question may should be transformed to: why the torch installed in venv can not find the NVIDIA Driver while the torch in conda env can? And how to fix the NVIDIA Driver error?
Some test outputs
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: False
In [3]: torch.cuda.get_arch_list()
Out[3]: []
In [4]: torch.__path__
Out[4]: ['/home/mathewshen/test/cuda-torch/.venv/lib/python3.11/site-packages/torch']
In [5]: torch.__version__
Out[5]: '2.2.1+cu121'
In [6]: torch.Tensor([1]).to("cuda:0")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 torch.Tensor([1]).to("cuda:0")
File ~/test/cuda-torch/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:302, in _lazy_init()
300 if "CUDA_MODULE_LOADING" not in os.environ:
301 os.environ["CUDA_MODULE_LOADING"] = "LAZY"
--> 302 torch._C._cuda_init()
303 # Some of the queued calls may reentrantly call _lazy_init();
304 # we need to just return without initializing in that case.
305 # However, we must not let any *other* threads in!
306 _tls.is_initializing = True
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
The pip list
Package Version
------------------------ ----------
asttokens 2.4.1
decorator 5.1.1
executing 2.0.1
filelock 3.13.1
fsspec 2024.3.0
ipython 8.22.2
jedi 0.19.1
jinja2 3.1.3
markupsafe 2.1.5
matplotlib-inline 0.1.6
mpmath 1.3.0
networkx 3.2.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.4.99
nvidia-nvtx-cu12 12.1.105
parso 0.8.3
pexpect 4.9.0
prompt-toolkit 3.0.43
ptyprocess 0.7.0
pure-eval 0.2.2
pygments 2.17.2
six 1.16.0
stack-data 0.6.3
sympy 1.12
torch 2.2.1
traitlets 5.14.2
triton 2.2.0
typing-extensions 4.10.0
wcwidth 0.2.13
I cannot imagine why wouldn’t this work.
Be sure that you are installing pytorch with cuda and with a cuda that is supported by the drivers.
Then set your LD_LIBRARY_PATH
to empty. I believe pytorch can fallback to some other installed cuda in your system. I’ve sometimes had issues with this sort of conflict.
Hi, @JuanFMontesinos,thanks for your reply!
I figure it out in recent, which is cause by an very inconspicuous question: he python install by linux homebrew is used to create the venv has some problem in it, when I reinstall the python with apt, problem solved.
In summary, there is no problem in torch or cuda, the problem is the python interpreter which is used to create the venv.
(PS: I still don’t know why the python installed by linux homebrew will have this problem. Maybe due to the some compatibility problems)
1 Like