Torch.cuda.is_available() returns false after suspend

r2d2bol · October 20, 2020, 4:50am

This is not repeatable behavior, but I have observed that sometimes after waking up from suspend, torch.cuda.is_available() returns false. If I run nvidia-smi, it detects all GPUs on the system. If I reboot, then torch.cuda.is_available() returns true again.

Is there any way to make torch detect cuda without rebooting?

Some details about the system:

ubuntu 20.04
cuda 10.2 (installed via conda)
nvidia driver 450.66
gpu is rtx 2080

ptrblck · October 21, 2020, 2:18am

You could try to reload the nvidia kernel module as described here.

jtchilders · October 13, 2021, 1:21pm

I see a similar issue and I cannot reset or reload the driver because my Ubuntu Xorg is using it to drive the monitor.
5.11.0-37-generic #41~20.04.2-Ubuntu SMP

| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   35C    P8     3W /  N/A |    378MiB /  7974MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1007      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A      2032      G   /usr/lib/xorg/Xorg                191MiB |
|    0   N/A  N/A      2167      G   /usr/bin/gnome-shell               30MiB |
|    0   N/A  N/A     17865      G   ...AAAAAAAAA= --shared-files       76MiB |
|    0   N/A  N/A     22353      G   ...AAAAAAAAA= --shared-files       30MiB |
+-----------------------------------------------------------------------------+```

ptrblck · October 13, 2021, 7:48pm

My GPU is also used to visualize the desktop and the linked commands work fine.
However, I’m not familiar with your setup, so a restart might be unavoidable.

Pavils_Jurjans · May 27, 2023, 7:53am

I can confirm that the kernel module reloading, as described by @ptrblck works for me on Ubuntu 22.04.