Torch.cuda.is_available() is False for cuda 9.0.176, cuda diver 390.77

Ran · December 6, 2018, 1:31am

nvidia-smi works, torch.backends.cudnn.enabled returns True, but torch.cuda.is_available() returns False. Reboot can’t do any help.
I don’t know what’s wrong?

CUDA path is as follows:
export PATH=/home/ubuntu/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/home/ubuntu/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

albanD · December 6, 2018, 10:25am

Hi,

How did you installed torch?

Ran · December 7, 2018, 1:57am

Hi, Pip install torch.
I had a try to install torch from source, and get the same result

albanD · December 7, 2018, 10:01am

So few things to check:

When installing with pip, did you take the wheel for cuda 9.0?
When installing from source, was your cuda install detected properly? If so is it detection the install that you want it to use?
Do you have the cuda samples? Do they run properly? Does nvidia-smi run properly?

Ran · December 10, 2018, 10:54am

When installing from source， there shows not using cudnn, cuda，nccl.

albanD · December 10, 2018, 10:58am

From the info in the setup, you might want to set CUDA_HOME=/path/to/your/cuda/install and similar for cudnn. nccl will be compiled from source if CUDA is detected properly, so you don’t need a local install (unless you have one and want to use it).

Ran · December 12, 2018, 6:33am

I use linux container, and cuda root is something different.
Now I add CUDA_HOME into environment variables, then conda install from source, and it can compile cuda correctly. But I still get torch.cuda.is_available false.

running build_ext
-- Building with NumPy bindings
-- Detected cuDNN at /home/ubuntu/cuda/lib64, /home/ubuntu/cuda/include
-- Detected CUDA at /home/ubuntu/cuda
-- Building NCCL library
-- Building with distributed package
-- Not using NNPACK

I tried to run cuda samples, it returned the following error. Hot reboot or cold reboot doesn’t work.

deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

Ran · December 12, 2018, 6:42am

I found that if I pip install torchvision, nvidia-smi will be unworkable.

albanD · December 12, 2018, 10:11am

Hi,

If the cuda samples don’t run. Then the problem is with your cuda install.
I would advice in that case to cleanly remove all cuda install from the system. And reinstall them from scratch with nvidia drivers that correspond. Then make sure that the cuda samples work properly. Once these work, you can install pytorch.

Ran · February 19, 2019, 2:59am

I solved it after several attempts at reinstalling pytorch