Torch can't use GPU, but it could before

I’m having a bizarre issue attempting to use Stable Diffusion WebUI. This is on Windows 10 64 bit with an NVIDIA GeForce GTX 980 Ti.

Previously, everything was working and it worked out of the box. However, it suddenly stopped working, with PyTorch unable to access the GPU. I don’t recall doing anything that is likely to have caused this (video driver update, python update, Windows update, etc.) and I can’t fix it now.

webui-user.bat errors out with:
AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

I searched many posts about this error and tried a number of things, none of them worked:

  • Deleted venv and tmp
  • Completely deleted Stable Diffusion WebUI and recloned
  • Reinstalled python 3.10.6
  • Verified that python is the one being used
  • Rebooted
  • Reinstalled the video driver
  • Tried different versions, including 516.94, 516.40. There was a specific 516.xx mentioned in one post that I was unable to find, so I tried those two. I don’t remember the version now and I can’t find the post again.
  • After the venv was set up, I switched to it and ran: pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
  • In the venv, I uninstalled torch, torchvision, and torchaudio, then reran that command
  • I manually made a clean virtual environment and ran that command in it

In the venv:
python -c "import torch; print(torch.randn(1).cuda())" gives RuntimeError: No CUDA GPUs are available
python -c "import torch; print(torch.cuda.is_available())" gives False

nvidia-smi gives my CUDA version as 11.7.

The only other thing I can imagine is that I have MSYS64 installed and it is in my path. However, the correct python is being found, so unless that’s bleeding into the virtual environment in some unexpected way, I don’t believe that’s the issue. I tried temporarily removing it from my path and it changed nothing.

I’m not sure what else to try. Does anything sound off?

Could you post the output of python -m torch.utils.collect_env, please?

Interesting result - hopefully it points in the right direction.

(venv) X:\stable diffusion\stable-diffusion-webui>python -m torch.utils.collect_env
Collecting environment information...
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "X:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\collect_env.py", line 505, in <module>
    main()
  File "X:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\collect_env.py", line 488, in main
    output = get_pretty_env_info()
  File "X:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\collect_env.py", line 483, in get_pretty_env_info
    return pretty_str(get_env_info())
  File "X:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\collect_env.py", line 330, in get_env_info
    pip_version, pip_list_output = get_pip_packages(run_lambda)
  File "X:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\collect_env.py", line 302, in get_pip_packages
    out = run_with_pip(sys.executable + ' -mpip')
  File "X:\stable diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\collect_env.py", line 290, in run_with_pip
    for line in out.splitlines()
AttributeError: 'NoneType' object has no attribute 'splitlines'

It seems that your environment might have trouble running python -mpip.
What does print(torch.version.cuda) return and what are you seeing in pip list | grep -i torch?

I debugged that script a little and it looks like it’s having trouble with the space in C:\Program Files\Python310\. It may have also been having a problem with a space in the path to Stable Diffusion. I can try reinstalling everything without spaces and get back to you on if that helped.

As for those commands:

(venv) X:\stable_diffusion\stable-diffusion-webui>python -c "import torch; print(torch.version.cuda)"
11.7
(venv) X:\stable_diffusion\stable-diffusion-webui>pip list | Z:\msys64\usr\bin\grep.exe -i torch
torch              1.13.1+cu117
torchaudio         0.13.1+cu117
torchvision        0.14.1+cu117

Edit:
I was able to get python -m torch.utils.collect_env to work:

(venv) X:\>python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: (Rev6, Built by MSYS2 project) 12.2.0
Clang version: 15.0.5
CMake version: version 3.25.1
Libc version: N/A

Python version: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.18362-SP0
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 980 Ti
Nvidia driver version: 516.40
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.1
[pip3] torch==1.13.1+cu117
[pip3] torchvision==0.14.1+cu117
[conda] Could not collect

(Note: this is after a reinstall so there may be slight differences between the other info I gave. It still doesn’t work.)

Your environment looks alright. Are you able to run any other CUDA application in WSL2? I don’t know if you’ve installed a CUDA toolkit in this environment, but if so you could try to build any CUDA sample and execute it to verify that the GPU is indeed running.
If not, it seems PyTorch has suddenly trouble communicating with the GPU which could indicate a broken driver installation.

I’ve solved this.

I have no idea how, but I had a bad environment variable CUDA_VISIBLE_DEVICES=2, 3. The correct value is 0. Simply removing that environment variable fixed the issue.

Thanks for your help, ptrblck.

That’s interesting. Do you know what could have set this env variable?

Unfortunately, I have no idea. There are a couple of other CUDA environment variables, but they look right and deal with the toolkit I just installed.

My best guess is some other piece of software was misbehaving. I have a number of other 3D-oriented programs on this computer, like Blender and Quixel Mixer. Maybe it had something to do with MSYS.

If it ever happens again I will try to isolate it. Until then, the best I can do is if you have an issue like this, check to make sure you don’t have a CUDA_VISIBLE_DEVICES set.