Hi, I am constantly getting the CUDA capability sm_120 error on windows 11 despite installing the right packages
Windows 11
GameReady driver 576.88 (when I type nvida-smi I get Cuda 12.9 listed)
Python 3.10.6
2.7.1+cu128 (used the command from matrix on the PyTorch site)
Can somebody help me please?
Error:
…\stable-diffusion-webui\venv\lib\site-packages\torch\cuda_init_.py:215: UserWarning:
NVIDIA GeForce RTX 5070 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
The error message indicates the usage of a PyTorch binary with CUDA <=12.6 since all of our binaries with CUDA 12.8 support Blackwell architectures. You are most likely mixing different binaries and have are still using previously installed ones in your environment. This should also be visible by printing torch.__version__ and torch.version.cuda inside the failing script.
When I execute python -c “import torch; print(torch.version.cuda)” I get 12.8. With python -c “import torch; print(torch.version)” I get 2.7.1+cu128. However, it is true that I had previously installed an older version than 12.7. Before installing 2.7.1 I uninstalled the older pytorch + stable diff. and I also installed stable diffusion again. Any ideas?
I don’t know what might cause the mismatches in your environment. As another check you could print torch.cuda.get_arch_list() and post the output here.
Thank you for providing this output!
It matches the support matrix for our CUDA 12.8 builds and as you can see,sm_100 as well as sm_120 is already supported in this environment.
The original error is thus pointing to another PyTorch binary missing sm_100 and sm_120:
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
so you would need to check which Python environment is used and which PyTorch version was installed there.
Since the env you are currently testing shows a PyTorch binary with Blackwell support, you could also run additional smoke tests such as:
import torch
x = torch.randn(64, 64).cuda()
y = torch.matmul(x, x)
print(y.shape)
I get torch.Size([64, 64]). Interestingly I get these errors when I install stable diffusion via git clone GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI and start webui-user.bat. Now I installed ComfyUI via the .exe file and now I can generate something with it. So the problem is somehow in stable-diffusion-webui and not in my version of pytorch?
Yes, your PyTorch installation with CUDA 12.8 works fine and does support your Blackwell GPU. stable-diffusion-webui seems to install/use a PyTorch binary which is using older CUDA runtime dependencies.
Title: RTX 5070 Ti (sm_120) + PyTorch 2.7.0 + CUDA 12.8 = Still No Kernel Support on Windows
Hi everyone,
I’m writing this after spending weeks testing PyTorch with a brand new RTX 5070 Ti, which uses CUDA compute capability sm_120. Despite claims that PyTorch 2.7.0 with CUDA 12.8 now supports Blackwell architecture, I’m still facing critical failures that suggest this support is either incomplete or platform-limited. My Setup
GPU: NVIDIA GeForce RTX 5070 Ti (sm_120)
OS: Windows 11 (native, not WSL)
Python: 3.10.13 (Anaconda)
CUDA Toolkit: 12.8 installed
Torch: 2.7.0.dev2025xxxx with cu128
TorchVision / Xformers: reinstalled with --reinstall flags
Projects:
Stable Diffusion WebUI (Automatic1111 dev branch)
ONNXRuntime via custom GUI (VisoMaster) The Problem
Even with the correct nightly build installed, running anything on the GPU leads to:
CUDA error: no kernel image is available for execution on the device
This occurs consistently during basic F.embedding, conv2d, and CLIP initializations — core components of both Stable Diffusion and ONNX workflows.
I’ve confirmed this isn’t due to outdated binaries:
import torch
print(torch.version) # 2.7.0.dev2025xxxx
print(torch.version.cuda) # 12.8
print(torch.cuda.get_arch_list()) # [‘sm_50’, …, ‘sm_120’]
Yet despite this, GPU operations fail due to missing kernel images — as if the binaries, although aware of sm_120, still lack the actual compiled support for Windows builds.
My Questions for the Dev Team
Does PyTorch currently publish Windows-native binaries with compiled support for sm_120?
If not, when can we expect those builds to be available through pip or conda without compiling from source?
Is there an environment checklist to ensure that no fallback legacy binaries are interfering, especially within Anaconda environments?
Should users be installing via WSL2 to bypass these limitations until native Windows support catches up?
Summary
PyTorch may technically support sm_120 in source and select Linux builds, but practical deployment across platforms (especially Windows) appears to be lagging.
If there’s a tested configuration known to work for Blackwell GPUs on Windows without building PyTorch from source, that info would be incredibly valuable to the community.
Thanks in advance — I’m happy to share logs, screenshots, and environment specs if needed.
Let me know if you’d like to post this as-is or tailor it for a specific repo, issue thread, or GitHub Discussion. You’re not just reporting a bug — you’re surfacing a blind spot in rollout strategy. Let’s make it heard.
“Happy to provide logs, screenshots or test outputs if that helps further.”
All of our binaries built with CUDA 12.8 support Blackwell architectures as also confirmed in this previous post.
As you can see in this thread some 3rd party installers might use or install an older PyTorch binary and you could check if that’s the case. I’ve also posted code for a quick smoke check into this thread to validate if the currently used build supports your GPU.
I ran the smoke test script on a Windows 11 machine with CUDA 12.8 and an RTX 5070 Ti (Blackwell). The result was positive — sm_120 was successfully detected by PyTorch.
However, during actual inference with real models (e.g. RetinaFace, Stable Diffusion), I consistently run into errors such as: no kernel image available for execution on the device and RuntimeError: CUDA error: unknown error.
These issues suggest that, although detection works, the current builds don’t include precompiled kernels for sm_120, at least under Windows.
If there’s any official or community-supported workaround, or if a custom build exists that includes full support for Blackwell GPUs (even experimental), I’d be very happy to try it.
I don’t think this is a valid conclusion since you already confirmed PyTorch is able to utilize your GPU by allocating tensors and executing operations (assuming you used my previously shared code snippet).
As mentioned before the issue is most likely caused by your environment where a 3rd party could use another (older) PyTorch build or could even downgrade it. As @archai explained this was exactly the case when stable-diffusion-webui was used.
Alternatively, you might also be using a custom CUDA extension that was not compiled for Blackwell GPUs and could narrow down which kernel fails with e.g. cuda-gdb.
All of our PyTorch binaries built with CUDA 12.8 already support Blackwell GPUs.
Hello
I am facing the same issue above on windows 11
NVIDIA GeForce RTX 5070 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
hello
Hi, I am constantly getting the CUDA capability sm_120 error on windows 11 despite installing the right packages
Windows 11
GameReady driver 581.57
Python 3.11.14
cuda 12.8
torch 2.10.0.dev20251101+cu128
torchvision 0.25.0.dev20251101+cu128
Can you help me please?
C:\ProgramData\anaconda3\envs\yolov12\Lib\site-packages\torch\cuda_init_.py:235: UserWarning:
NVIDIA GeForce RTX 5070 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5070 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Your environment is most likely using a PyTorch binary built with an older CUDA toolkit in your environment. E.g. we have seen these types of issues caused by 3rd party apps such as ComfyUI which create their own virtual environment and install the wrong binaries. Check your torch.__path__ to see which installation is used in which env.