Torch.cuda.is_available() is False for cuda 9.0.176, cuda diver 390.77

nvidia-smi works, torch.backends.cudnn.enabled returns True, but torch.cuda.is_available() returns False. Reboot can’t do any help.
I don’t know what’s wrong?

CUDA path is as follows:
export PATH=/home/ubuntu/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/home/ubuntu/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Hi,

How did you installed torch?

Hi, Pip install torch.
I had a try to install torch from source, and get the same result

So few things to check:

  • When installing with pip, did you take the wheel for cuda 9.0?
  • When installing from source, was your cuda install detected properly? If so is it detection the install that you want it to use?
  • Do you have the cuda samples? Do they run properly? Does nvidia-smi run properly?

When installing from source, there shows not using cudnn, cuda,nccl.

From the info in the setup, you might want to set CUDA_HOME=/path/to/your/cuda/install and similar for cudnn. nccl will be compiled from source if CUDA is detected properly, so you don’t need a local install (unless you have one and want to use it).

I use linux container, and cuda root is something different.
Now I add CUDA_HOME into environment variables, then conda install from source, and it can compile cuda correctly. But I still get torch.cuda.is_available false.

running build_ext
-- Building with NumPy bindings
-- Detected cuDNN at /home/ubuntu/cuda/lib64, /home/ubuntu/cuda/include
-- Detected CUDA at /home/ubuntu/cuda
-- Building NCCL library
-- Building with distributed package
-- Not using NNPACK

I tried to run cuda samples, it returned the following error. Hot reboot or cold reboot doesn’t work.

deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

I found that if I pip install torchvision, nvidia-smi will be unworkable.

Hi,

If the cuda samples don’t run. Then the problem is with your cuda install.
I would advice in that case to cleanly remove all cuda install from the system. And reinstall them from scratch with nvidia drivers that correspond. Then make sure that the cuda samples work properly. Once these work, you can install pytorch.

I solved it after several attempts at reinstalling pytorch