RTX 5070 Ti (Blackwell) + PyTorch Nightly + Triton: Still Getting "sm_120 is not defined for option 'gpu-name'" Error

Hello everyone,

I’m trying to get ComfyUI working on a new RTX 5070 Ti (Blackwell) GPU under Windows 11.
I’ve already upgraded to the latest PyTorch Nightly with CUDA 12.8 support (which should include Blackwell support).
However, I keep running into this error when running any Triton-based code (e.g., ComfyUI, Stable Diffusion):
ptxas fatal : Value ‘sm_120’ is not defined for option ‘gpu-name’
RuntimeError: ptxas failed with error code 4294967295

My environment:

  • GPU: NVIDIA GeForce RTX 5070 Ti (Blackwell)
  • OS: Windows 11 Enterprise x64
  • Python: 3.10.6
  • PyTorch: 2.8.0.dev20250528+cu128 (Nightly)
  • CUDA: 12.9 Toolkit (runtime in PyTorch: 12.8)
  • Triton: 3.2.0 (from pip), also tried latest main from git clone
  • ComfyUI: v0.3.34

What I’ve tried:

  • Verified single PyTorch install: pip show torch (only one, correct version, as above)
  • Upgraded PyTorch to nightly with CUDA 12.8 as per PyTorch Get Started guide
  • Installed CUDA Toolkit 12.9 (system-wide)
  • Upgraded Triton (pip install --upgrade --force-reinstall triton), only 3.2.0 available via pip
  • Also did git clone https://github.com/triton-lang/triton.git and pip install . in that folder (which installs 3.2.0)
  • Did NOT try to build Triton from a specific tag like v3.3.1 yet (just used the default/main branch)
  • Cleaned Python caches (__pycache__), cleared ComfyUI temp folders
  • Rebooted after every step
  • Drivers updated: NVIDIA 576.52 (latest as of today)
  • All software launched from the same Python environment (where python and where pip point to the same folder)
  • Environment variables (CUDA_PATH etc) are correct

Questions / Help Needed:

  1. Is there an official Triton 3.3.1 pip wheel for Windows? (I only see 3.2.0)
  2. Is it necessary to build Triton from source using the exact 3.3.1 tag for Blackwell support?
    If yes, can anyone confirm that ComfyUI / Stable Diffusion / TorchInductor work after that step, on Windows?
  3. Do I need to point PyTorch or Triton to any specific binaries/libraries from CUDA 12.8/12.9?
  4. Any other tweaks needed for Blackwell GPUs, e.g. editing config files, workarounds, etc.?

Additional info:

  • All packages and dependencies are installed in a clean Python 3.10 environment.
  • No old PyTorch, CUDA, or Triton left over (I double-checked).
  • Still getting the 'sm_120' is not defined for option 'gpu-name' error (see full log below).

Any help or pointers are very much appreciated.
If someone already has Blackwell GPUs working with the PyTorch Nightly + Triton + CUDA 12.8+ toolchain on Windows, please share your working setup and exact install steps!

Thank you!

I don’t think triton is supported on Windows, e.g. see this issue, so unsure if you are installing custom built binaries from users (and hope you have verified them). Are you seeing the same issue on Linux or WSL?

Hello,
Thanks for the link and clarification. I understand Triton is not officially supported on Windows, but many users (myself included) have successfully used Triton pip wheels (up to v3.2.0) with ComfyUI/Stable Diffusion on Windows for long time. The only thing really missing is updated pip wheels for Triton 3.3.x+ with Blackwell support.

So, even without official support, the main request from the Windows AI community is for updated pip wheels to keep our pipelines working on new NVIDIA cards.

Hi everyone,

I’d like to add my experience to this thread, as I’m encountering a very similar issue with my RTX 50 series GPU.

I have a NVIDIA GeForce RTX 5070.
Its compute capability is sm_120 (Blackwell architecture).
My operating system is Windows 10, Version 22H2 (OS Build 19045.4412).
My NVIDIA driver version is 552.44.

I am trying to run Fooocus (v2.5.5), and the exact error message I receive when the software attempts to utilize the GPU is:
RuntimeError: CUDA error: no kernel image is available for execution on the device

I’ve attempted with the following PyTorch versions within my Fooocus environment:

  • The default version that came with Fooocus (torch-2.1.0+cu121)
  • A manual update to torch-2.5.1+cu121

In both cases, the problem persists. It’s helpful to see other users with similar hardware reporting this.

Is there any update on the roadmap or ETA for stable PyTorch support for the sm_120 architecture? Any information regarding this compatibility would be greatly appreciated by users with RTX 50 series hardware.

Thank you for your time and efforts.

Blackwell is already supported in nightly and the latest stable 2.7.0 release. Select the PyTorch binary from our install matrix with CUDA 12.8 and it will work.