You’re right and I finally got it working. Thanks again for your help.
Were you able to narrow down the failing kernel? If so, which was it?
I tracked the failure to the embedding kernels (torch.nn.functional.embedding
). Upgrading to the latest nightly (with sm_120
support) resolved that and allowed me to train/infer properly on the RTX 5080. I did run into a minor complication with PyTorch’s newer “safe unpickling” defaults (they impacted my weight conversion tools), but adding an allow-list for Fairseq classes fixed it.
Thanks again—appreciate your help!
Hi, I had the same error as you, even after install the nightly binaries. What helped was just a small addition to the pip install that would install matching torchaudio and torchvision:
pip install --pre torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
I have a 5070 ti. After installing, I ran the command:
“torch.get_arch_list()” and got the following:
[‘sm_75’, ‘sm_80’, ‘sm_86’, ‘sm_90’, ‘sm_100’, ‘sm_120’, ‘compute_120’]
Any update so far? 'Cause the PyTorch Nightly preview version still won’t work on my RTX 5070 with CUDA 12.8.
It does work as was already confirmed in this thread and others a lot of times. If you are encountering an issue you would need to describe it in more detail.
Is there any pytorch Docker image that supports sm_120
?
Yes, pytorch/pytorch:2.7.0-cuda12.8-cudnn9-devel
and pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime
support sm_120.
Besides that the NGC PyTorch containers support Blackwell since their 25.01 release.
Hi, I have windows and am having issues getting pytorch to work properly with my 5090. I previously had several other 4000 series cards where pytorch worked wonderfully. I initially tried to install using the get-started/locally page but found that just pasting the configurator command did not work. All I received was messages stating everything was already satisfied. Next I uninstalled pytorch, torchaudio and torchvision via pip uninstall. After that I tried to install using the command provided by the configurator and this time it went through an install but I still receive the warning that sm_120 is not compatable with the current version of pytorch. Testing in a separate environment shows that a clean install will work. The issue I have is that I have packages in this environment that I cannot obtain anymore. How can I clear pytorch out of an environment so I can get a clean install? I think this might be what a lot of people searching on the internet are looking for when they say it is not working.
Thank you I’ve just installed NVIDIA RTX 5060 TI with this repositories
Hi there - I have the RTX 5090 and I am having no luck installing Pytorch, is it possible to get it working with a nightly build?
if so can you put me i the right direction please?
You can install our latest stable (2.7.0/1) or nightly binary with CUDA 12.8 using the install instructions from the install matrix.
You are using a PyTorch binary with CUDA <= 12.6 while PyTorch 2.7.0+ with CUDA 12.8+ is needed. Select it from the install matrix as explained in the post above and it will work.
As a quick smoke test you could run this directly after installing the latest stable or nightly binary:
import torch
print(torch.__version__)
print(torch.cuda.get_arch_list())
print(torch.randn(1).cuda())
Hi everyone,
I’m running into an issue using PyTorch on a brand new NVIDIA RTX 5080 GPU (sm_120) on Windows 11.
Here’s what I’ve tried so far:
- Installed the latest PyTorch nightlies with CUDA 12.8 support:
torch: 2.9.0.dev20250806+cu128
torchvision: 0.24.0.dev20250806+cu128
torchaudio: 2.8.0.dev20250806+cu128 - Python version: 3.10.11 (Win64)
- Driver: NVIDIA 550.80.8 (supports Blackwell/RTX 5080)
- CUDA version: 12.8
Verification:
print(torch.__version__)
→ 2.9.0.dev20250806+cu128print(torch.version.cuda)
→ 12.8print(torch.cuda.is_available())
→ Trueprint(torch.cuda.get_device_name())
→ NVIDIA GeForce RTX 5080
Problem:
-
When running Stable Diffusion or any workflow involving
torch.nn.functional.embedding
, I get the error: RuntimeError: CUDA error: no kernel image is available for execution on the device Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.This seems to suggest some CUDA kernels for sm_120 are still missing in the current nightly.
Questions:
- Is there a specific nightly build where all relevant CUDA kernels (especially embedding and SD-related ops) are now included for sm_120 on Windows?
- Is there any ETA for full Blackwell (sm_120, RTX 5080/5090) support in Windows nightlies?
- Any known workarounds besides waiting for new nightlies or switching to Linux?
Thanks in advance for your help!
Could you post a minimal and executable code snippet reproducing this issue, please?
I ran the minimal test below and it worked without any errors:
import torch
input_ids = torch.randint(0, 1000, (1, 10), device=‘cuda’)
embedding = torch.nn.Embedding(1000, 64).cuda()
output = embedding(input_ids)
print(output)
So, the basic embedding kernel is working with my setup (torch 2.9.0.dev20250806+cu128, CUDA 12.8, RTX 5080).
However, when I run Stable Diffusion (AUTOMATIC1111 or ComfyUI), I get:
RuntimeError: CUDA error: no kernel image is available for execution on the device
Could it be a different CUDA kernel that Stable Diffusion needs, which is not yet compiled for sm_120?
Do you have a suggestion for a minimal test that could trigger the same error as Stable Diffusion?
This is good to hear and thanks for confirming PyTorch itself is working fine!
This could be the case and I would not know which custom kernel is used or why.
You could try to narrow down the kernel via:
cuda-gdb --args python script.py args
set cuda api_failures stop
...
r
...
bt
which should show the failing kernel.
If no custom kernel is used, your Stable Diffusion environment might also use another PyTorch binary, which could be an older build or a new release using an older CUDA toolkit.
In any case, the debugging step should show which kernel fails and we can keep debugging from there.
Thanks for your help, ptrblck!
-
The basic embedding kernel works fine with my setup (PyTorch 2.9.0.dev20250806+cu128, CUDA 12.8, RTX 5080, Python 3.10.11, Windows 11).
-
The error only occurs when running Stable Diffusion (AUTOMATIC1111 WebUI), with all extensions disabled.
-
I’ve installed CUDA Toolkit 13.0 (full install), but it appears that
cuda-gdb
is not available on Windows (even after install, there is nocuda-gdb.exe
anywhere on my system).
From NVIDIA docs and forums,cuda-gdb
seems to be Linux-only.
Is there any alternative way to identify the failing kernel or debug this on Windows, or is there another diagnostic I can try?
Could the issue be with a custom CUDA kernel inside Stable Diffusion or one of its dependencies that isn’t yet compiled for sm_120 (Blackwell)?
Thanks again for your guidance!
Hello everyone,
I’m personally running into issues using Stable Diffusion on my NVIDIA GeForce RTX 5090 (SM_120). Every attempt results in CUDA errors stating “no kernel image is available for execution on the device”.
Here’s what I have tried so far in detail:
-
Standard installation using Automatic1111’s WebUI
-
Cloned the repository, created a virtual environment, and ran
launch.py
. -
Installed all dependencies automatically.
-
The model starts downloading, but fails to load after attempting to initialize the weights.
-
-
CUDA / PyTorch attempts
-
Tried the official PyTorch releases with CUDA 12.1.
-
Tested nightly builds with CUDA 12.8 support.
-
Set environment variables
TORCH_USE_CUDA_DSA=1
andCUDA_LAUNCH_BLOCKING=1
for debugging.
-
-
Community workarounds
- Followed suggestions in the PyTorch forums and GitHub discussions, including attempts with modified launch arguments and settings.
For context, I run Stable Diffusion as part of a larger project called Ameca, a personal automation/AI assistant built on Node.js and WA-Automate-NodeJS, which integrates image generation features. Stable Diffusion is intended to provide generated visuals for Ameca’s automated tasks.
Even more, dozens to hundreds of users have asked me to re-enable this feature, so the SM_120 support would have a significant impact if it becomes functional.
Despite these efforts, the model still fails to load. From my personal experience, it seems that SM_120 support is still not officially available, and users with RTX 50-series GPUs may encounter the same issues.
It would be extremely helpful if the maintainers could clarify the current status of SM_120 support or suggest any reliable workaround.
Thanks,
Tom
All of our nightly and stable PyTorch binaries including 2.7.0+ using CUDA 12.8+ support Blackwell GPUs as was also confirmed in this thread.
The same applies as before: try to narrow down which kernel fails, as PyTorch itself works and the issue is thus most likely coming from a 3rd party package you are installing and using in your workflow: