Trouble with CUDA capability sm_86

Hi, all
I am trying to train a network on my NVIDIA RTX A4000. I receive the following error:
NVIDIA RTX A4000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
pytorch-lightning 1.5.0
torch 1.11.0
torchaudio 0.11.0
torchmetrics 0.6.0
torchvision 0.12.0
CUDA Version : 11.4
Python 3.8.10

1 Like

You’ve most likely install the pip wheel or conda binary with the CUDA 10.2 runtime, which doesn’t support your Ampere GPU. Select CUDA 11.3 in the install setup here and it should work.

1 Like

I did it. but after I received the following error:
UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at …/c10/cuda/CUDAFunctions.cpp:112.)
return torch._C._cuda_getDeviceCount() > 0
No CUDA runtime is found, using CUDA_HOME=’/usr’

1 Like

CUDA unknown error could point to anything broken in your setup.
If you get stuck, reinstall the NVIDIA drivers, then the conda binaries again and recheck the functionality.

1 Like

Hi @ptrblck , i had the same issue and reinstalled pytorch as you suggested but now i get the following error:

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

Can you please help me out with this?

Could you try to get a stacktrace via:

gdb --args python script.py args
...
run
...
bt

And post it here, please?

1 Like

(venv) activelearning@activelearning:~/Documents/govardhan/invokeAI/InvokeAI/scripts$ gdb --args python invoke.py args
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “x86_64-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:

Find the GDB manual and other documentation resources online at:

For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from python…
(No debugging symbols found in python)
(gdb) run
Starting program: /home/activelearning/Documents/govardhan/invokeAI/venv/bin/python invoke.py args
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
path .
warning: Loadable section “.note.gnu.property” outside of ELF segments
[Detaching after fork from child process 2118905]
[New Thread 0x7fff86eaa700 (LWP 2118911)]
[New Thread 0x7fff866a9700 (LWP 2118912)]
[New Thread 0x7fff81ea8700 (LWP 2118913)]
[New Thread 0x7fff7f6a7700 (LWP 2118914)]
[New Thread 0x7fff7cea6700 (LWP 2118915)]
[New Thread 0x7fff7a6a5700 (LWP 2118916)]
[New Thread 0x7fff77ea4700 (LWP 2118917)]
usage: invoke.py [-h] [–laion400m LAION400M] [–weights WEIGHTS] [–conf CONF] [–model MODEL] [–sampler SAMPLER_NAME] [-F] [–free_gpu_mem]
[–precision PRECISION] [–from_file INFILE] [–outdir OUTDIR] [–prompt_as_dir] [–grid] [–embedding_path EMBEDDING_PATH]
[–no_restore] [–no_upscale] [–esrgan_bg_tile ESRGAN_BG_TILE] [–gfpgan_model_path GFPGAN_MODEL_PATH]
[–gfpgan_dir GFPGAN_DIR] [–web] [–web_develop] [–web_verbose] [–cors [CORS [CORS …]]] [–host HOST] [–port PORT]
[–gui]
invoke.py: error: unrecognized arguments: args
usage: invoke.py [-h] [–laion400m LAION400M] [–weights WEIGHTS] [–conf CONF] [–model MODEL] [–sampler SAMPLER_NAME] [-F] [–free_gpu_mem]
[–precision PRECISION] [–from_file INFILE] [–outdir OUTDIR] [–prompt_as_dir] [–grid] [–embedding_path EMBEDDING_PATH]
[–no_restore] [–no_upscale] [–esrgan_bg_tile ESRGAN_BG_TILE] [–gfpgan_model_path GFPGAN_MODEL_PATH]
[–gfpgan_dir GFPGAN_DIR] [–web] [–web_develop] [–web_verbose] [–cors [CORS [CORS …]]] [–host HOST] [–port PORT]
[–gui]
invoke.py: error: unrecognized arguments: args
[Thread 0x7fff81ea8700 (LWP 2118913) exited]
[Thread 0x7fff77ea4700 (LWP 2118917) exited]
[Thread 0x7fff7a6a5700 (LWP 2118916) exited]
[Thread 0x7fff7cea6700 (LWP 2118915) exited]
[Thread 0x7fff7f6a7700 (LWP 2118914) exited]
[Thread 0x7fff866a9700 (LWP 2118912) exited]
[Thread 0x7fff86eaa700 (LWP 2118911) exited]
[Inferior 1 (process 2118901) exited with code 0377]
(gdb) bt
No stack.
(gdb)

Your execution fails since args is undefined and you should replace it with your optional real arguments to the script. My example is only a template and shouldn’t be used directly.