Pytorch.cuda.is_available() is false

einrone · February 27, 2021, 12:23am

I trained a network, and when I started new process, I noticed
that cpu where turned on, got this userwarning:

Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/home/arams/anaconda3/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
False
>>> print(torch.version.cuda())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable
>>> print(torch.version.cuda)
11.0

When i type nvidia-smi, I have:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3070    Off  | 00000000:23:00.0  On |                  N/A |
| 33%   34C    P5    19W / 220W |    235MiB /  7959MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1020      G   /usr/lib/xorg/Xorg                103MiB |
|    0   N/A  N/A      1278      G   /usr/bin/gnome-shell              126MiB |
|    0   N/A  N/A      2418      G   /usr/lib/firefox/firefox            3MiB |
+-----------------------------------------------------------------------------+

I am very confused, tried to downgrade to cuda 10.2, but didnt work so i upgraded again to 11.0. Currently I have 460.39 drivers on my nvidia card, and OS: ubuntu 20.4. Help and guidance, is greatly appriciated.

pbeza · February 27, 2021, 7:35pm

I had similiar problem and it turned out that I forgot to install nvcc (Nvidia CUDA Compiler). Did you install it?

einrone · February 28, 2021, 1:17am

I actually dont know, because it worked before and ran several training. There was one time i ctrl-z to exit the program, and after that cuda stopped working.

KFrank · February 28, 2021, 3:41am

Hi Einrone!

I had a similar situation on a linux machine where cuda worked and
then it didn’t work. Rebooting made the problem go away. (The
speculation was that somehow multiple cuda drivers got activated
and were conflicting with one another. There might be a gentler
way of restarting the cuda driver than rebooting, but I don’t know
what it might be.)

Best.

K. Frank

einrone · February 28, 2021, 9:14am

I tried to reboot the computer several times, but no luck. Or did you mean rebooting something else?

einrone · February 28, 2021, 2:19pm

I downgraded pytorch to version 1.6 and cudatoolkit 10.2. The userwarning disappeared, but torch.cuda.is_available() still return false. I tried to install nvcc compiler, no luck I also found out and read that nvcc compiler is not needed in order to run pytorch cuda. Any other suggestions? @ptrblck

ptrblck · February 28, 2021, 8:37pm

If PyTorch with its CUDA runtime was working and suddenly stopped, an unwanted driver update might have been executed by your OS, which might have broken the installation (as @KFrank also mentioned).
I usually disable Ubuntu’s driver updates for CUDA/NVIDIA, since it has already broken my installation a couple of times without any warning.

The best advice I would have at this time would be to check for all NVIDIA driver installations, remove different versions, and stick to the latest one.

einrone · February 28, 2021, 10:44pm

How do i check all drivers installed?

ptrblck · March 1, 2021, 6:40am

You could use e.g. dpkg -l | grep -i nvidia to check for all packages, which contain the nvidia name in it.