nvidia-smi works, torch.backends.cudnn.enabled returns True, but torch.cuda.is_available() returns False. Reboot can’t do any help.
I don’t know what’s wrong?
CUDA path is as follows:
export PATH=/home/ubuntu/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/home/ubuntu/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
From the info in the setup, you might want to set CUDA_HOME=/path/to/your/cuda/install and similar for cudnn. nccl will be compiled from source if CUDA is detected properly, so you don’t need a local install (unless you have one and want to use it).
I use linux container, and cuda root is something different.
Now I add CUDA_HOME into environment variables, then conda install from source, and it can compile cuda correctly. But I still get torch.cuda.is_available false.
running build_ext
-- Building with NumPy bindings
-- Detected cuDNN at /home/ubuntu/cuda/lib64, /home/ubuntu/cuda/include
-- Detected CUDA at /home/ubuntu/cuda
-- Building NCCL library
-- Building with distributed package
-- Not using NNPACK
I tried to run cuda samples, it returned the following error. Hot reboot or cold reboot doesn’t work.
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL
If the cuda samples don’t run. Then the problem is with your cuda install.
I would advice in that case to cleanly remove all cuda install from the system. And reinstall them from scratch with nvidia drivers that correspond. Then make sure that the cuda samples work properly. Once these work, you can install pytorch.