Hi! I just migrated to Ubuntu on my Asus Tuf Laptop and am having difficulty getting Stable Diffusion via Automatic1111 repo up and running due to pytorch not being able to use my GPU.
My setup:
I installed nvidia drivers via apt (tried 525, also didn’t work). Currently nvidia-smi gives me:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 38C P8 1W / N/A | 203MiB / 4096MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1692 G /usr/lib/xorg/Xorg 81MiB |
| 0 N/A N/A 1947 G /usr/bin/gnome-shell 119MiB |
+-----------------------------------------------------------------------------+
My lshw output:
$ sudo lshw -c display
[sudo] password for sd:
*-display
description: VGA compatible controller
product: TU117M [GeForce GTX 1650 Mobile / Max-Q]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:81 memory:f6000000-f6ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:f000(size=128) memory:f7000000-f707ffff
*-display
description: VGA compatible controller
product: Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:05:00.0
logical name: /dev/fb0
version: c2
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi msix vga_controller bus_master cap_list fb
configuration: depth=32 driver=amdgpu latency=0 resolution=1920,1080
resources: irq:24 memory:e0000000-efffffff memory:f0000000-f01fffff ioport:c000(size=256) memory:f7500000-f757ffff
But my output from stable diffusion gives me:
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Commit hash: 226d840e84c5f306350b0681945989b86760e616
Traceback (most recent call last):
File "/home/sd/stable_diffusion_stuff/stable-diffusion-webui/launch.py", line 360, in <module>
prepare_environment()
File "/home/sd/stable_diffusion_stuff/stable-diffusion-webui/launch.py", line 272, in prepare_environment
run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'")
File "/home/sd/stable_diffusion_stuff/stable-diffusion-webui/launch.py", line 129, in run_python
return run(f'"{python}" -c "{code}"', desc, errdesc)
File "/home/sd/stable_diffusion_stuff/stable-diffusion-webui/launch.py", line 105, in run
raise RuntimeError(message)
RuntimeError: Error running command.
Command: "/home/sd/stable_diffusion_stuff/stable-diffusion-webui/venv/bin/python3" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'"
Error code: 1
stdout: <empty>
stderr: /home/sd/stable_diffusion_stuff/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:88: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "<string>", line 1, in <module>
AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
If I open up python within the venv I see that cuda knows that there is one device, but the moment I try to get more info or do anything it errors out:
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sd/stable_diffusion_stuff/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 341, in get_device_name
return get_device_properties(device).name
File "/home/sd/stable_diffusion_stuff/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 371, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/home/sd/stable_diffusion_stuff/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 229, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice
>>> torch.cuda.is_available()
/home/sd/stable_diffusion_stuff/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:88: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
return torch._C._cuda_getDeviceCount() > 0
False
I am very lost at this point and have tried nvidia’s PPA drivers as well as now the Ubuntu drivers and can’t seem to get any setup that makes it passed this point. I appreciate any and all help you can offer. Thank you!