PyTorch project no more working after upgrade to Ubuntu 22.04

I think I’m missing some general universal understanding how to approach these kind of things.

I have a Python project (GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI) that has it’s own virtualenv with all the relevant modules.

It used to run on Ubuntu 20.04, before I upgraded to Ubuntu 22.04. Now it does not pass the torch.cuda.is_available() test.

Some version info:

>>> import torch
>>> print(torch.__version__)
2.0.1+cu118

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

From NVIDIA Server app:
Operating System: Linux-x86_64
NVIDIA Driver Version: 525.116.04
Graphics Processor: NVIDIA GeForce RTX 2070 Super
CUDA Cores: 2560
Total Memory: 8192 MB

I guess there’s some driver/version incompatibility issue going on, but have no idea from where to start. How can I debug this situation and move towards solution?

I agree as the failure points towards a broken driver installation.
I don’t know if you’ve reinstalled the NVIDIA drivers after updating to Ubuntu 22.04, but I would assume the update might have broken your setup.

1 Like

Well, after purging the NVIDIA drivers, rebooting in safe mode, and reinstalling them back, PyTorch now says that cuda_is_available. But not without several completely dead boots in between, where it stopped with almost black screen, just some system messages, and then nothing. At one point almost decided I have to reinstall the whole desktop. Quite stressful experience, do not recommend.

My stack repeatedly breaks after Ubuntu updates also, I think the kernel updates are doing it. It puts the kernel modules out of sync with the install cuda libs. I think it’s because the kernel update needs a reboot but the libs don’t. It’s a paint though as it happens frequently.