"CUDA is not available" after installing a different version of CUDA

deep-learningLearner · November 11, 2021, 4:51pm

Previously, I could run pytorch without problem. After installing a new version (older version) of CUDA, I got following error, and cannot resume this.

UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

I installed Pytorch (for CUDA 11.3) from the official site using following command.
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

This worked without no problem, though my CUDA version is 11.6 and Pytorch is for CUDA 11.3

After a while, I installed an older version of CUDA (11.5), since I wanted to use CuPy and the latest version of CUDA that CuPY supports is 11.5.

I have downloaded CUDA from following link and installed. CUDA Toolkit 11.5 Downloads | NVIDIA Developer

As a result, I have been able to use CuPy as expected, but not run Pytorch with GPU. When I run pytorch, I got following error, and CPU runs instead.

UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

“torch.cuda.is_available()” returned False.

Though I had installed CUDA11.5, nvidia-smi still showed CUDA 11.6, while nvcc --version showed CUDA 11.5

Seeing the situation, I have uninstalled CUDA11.5, using “App & Features” function of Windows GUI. I have uninstalled and reinstalled Pytorch.
However, the situations above have still continue.

GPU cannot be used by torch with error “CUDA is not available. Disabling”
torch.cuda.is_available() → False
Strangely, CuPy can be still used
(“nvcc” is uninstalled and cannot be used. )

I have tried for a while and no idea for now. Hope someone can provide me a hint.

ptrblck · November 12, 2021, 7:30am

Based on your description I would guess that you would need to reinstall PyTorch with the CUDA11.3 runtime as the driver downgrade might have broken your installation.

deep-learningLearner · November 12, 2021, 8:34am

Thank you for reply.

I have reinstalled from following link,

pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

However, the situation is the same.

Following is the result of deviceQuery of CUDA. It suggest there is surely a GPU with driver version 11.6/ runtime version 11.5 …

 ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3080 Laptop GPU"
  CUDA Driver Version / Runtime Version          11.6 / 11.5
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 16384 MBytes (17179344896 bytes)
  (048) Multiprocessors, (128) CUDA Cores/MP:    6144 CUDA Cores
  GPU Max Clock rate:                            1245 MHz (1.25 GHz)
  Memory Clock rate:                             6001 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.6, CUDA Runtime Version = 11.5, NumDevs = 1
Result = PASS

Followings are current ideas that may be causing the issue. How do you think?

Cuda toolkit 11.5 is not supported by Pytroch, so need downgrading of the toolkit to 11.3
There are different similar directories in /usr/local/ and Pytorch is confusing which to use. There are “cuda”, “cuda-11”, and “cuda-11.5” folders in /usr/local

ptrblck · November 12, 2021, 8:36am

No, that’s not the case as I’m using CUDA 11.5 in a source build as well as the current CUDA11.3 nightly and stable conda binaries and pip wheels.

The pip wheels and conda binaries use their own CUDA runtime as specified in the install command. Your local CUDA toolkit will only be used when you are building PyTorch from source or a custom CUDA extension.
To run the binaries you would only need a properly installed driver.

deep-learningLearner · November 12, 2021, 9:12am

Thank you. It is helpful to know Cuda 11.5 is not the issue.

Just for additional information, GPU worked with a CUDA sample code, so the issue seems to be not in CUDA installment itself, but how that is recognized by Pytorch.

/usr/local/cuda/samples/4_Finance/BlackScholes$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 0.237344 msec
Effective memory bandwidth: 337.063867 GB/s
Gigaoptions per second    : 33.706387

BlackScholes, Throughput = 33.7064 GOptions/s, Time = 0.00024 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed

ptrblck · November 12, 2021, 9:14am

Given that the initial setup was working, I also wouldn’t blame PyTorch but would guess that reinstalling the drivers has broken the shipped CUDA runtime.
Try to create a new virtual environment and reinstall the wheels there. If that doesn’t help, try to reinstall your initial setup.

EDIT: in case you haven’t done it yet, restart the laptop first and see if some dangling libs are still loaded and let me know how it goes.

deep-learningLearner · November 12, 2021, 11:21am

Thanks.

I have tried using Nvidia NGC container with Pytorch.
(PyTorch | NVIDIA NGC)

Using this, it seems CUDA works (torch.cuda.is_available() returns TRUE.)

I will use the container for now for Pytorch. (It is better if I could use it with local environment, but it seems current solution is reinstalling CUDA driver, and I am afraid this may cause further confusions and eventually lead to factory reset… considering my configuration skills)

ptrblck · November 12, 2021, 7:29pm

The docker container might be a good workaround for now.
In case you decide to try to fix your environment, check all CUDA and driver installation and see if you might have mixed them (e.g. using a .run file initially and then installing another CUDA toolkit via the .deb file etc.). In such a case, make sure to clean the environment first and reinstall the new driver using your preferred way.

deep-learningLearner · November 13, 2021, 6:26am

Thank you for the advice. Will do so when I reconfigure local environment.

InTheYear2525 · November 15, 2021, 8:52pm

From PyTorch Cuda with anaconda not available follows that you need to do a clean uninstall before you reinstall pytorch and attached packages. You might also check whether there are old files somewhere after the uninstall, see How can l uninstall PyTorch?.

If you want to install pytorch, you always need to install everything needed in one go, messing around with cuda as an additional package is error prone. It is recommended to install pytorch and additional packages in one go from conda:

Anaconda is our recommended package manager since it installs all dependencies.

And pip does not install all dependencies, when this post is right:

It uses preinstalled CUDA and doesn’t download own CUDA Toolkit.

See How to run pytorch with NVIDIA “cuda toolkit” version instead of the official conda “cudatoolkit” version?.

Therefore, using pip is your actual problem, because pip installed pytorch can be expected to need exactly the right version, else the dependencies can fail.

In short: do not use pip. Use conda with an independent “cudatoolkit”, do not use the system “cuda toolkit”, and that is also what pytorch recommends.

ptrblck · November 16, 2021, 10:06am

In case “need exactly the right version” refers to a local CUDA toolkit, then it’s wrong.
The pip wheels statically link the CUDA runtime and don’t need a local CUDA toolkit same as the conda binaries. The difference between these two is the way to statically link (pip wheels) vs. dynamically link (conda binaries) the CUDA runtime.

arpit_sahni · January 3, 2022, 6:34pm

hi! How can I install CUDA toolkit for pytorch on windows 11?? the latest pytorch supports is 11.3 but it does not give a option to download a windows 11 version and only gives option for windows 10

ptrblck · January 3, 2022, 10:27pm

I don’t see a Windows 10 - specific install instruction and from the install page it also seems as if Windows 11 should be supported:

PyTorch is supported on the following Windows distributions:

Windows 7 and greater; Windows 10 or greater recommended.

Where did you find the Windows 10 limitation?