Pytorch support for sm120

AiD3veloper · April 5, 2026, 6:42pm

For anyone running CSM-1B (Sesame’s TTS model) on RTX 5090 — I got it working at 0.46x RTF with CUDA graph replay using nightly cu128. Had to patch HF Transformers in 4 places: StaticCache index_copy_ → slice assignment, 3x arange fixes in modeling_csm.py, and cudagraph_mark_step_begin calls in the generate loop. Full pipeline + auto-patcher here: https://github.com/D3velop-llc/csm-rtx5090

Vidusahan_Perera · April 29, 2026, 9:12am

this worked on my rtx 5050 too. Thank you very much.

Vidusahan_Perera · April 29, 2026, 11:18am

RTX 5050 + CUDA 12.8 + PyTorch (Working Setup)

Hi everyone,

I recently set up PyTorch on a machine with an NVIDIA RTX 5050 Laptop GPU, and wanted to share a working configuration since older/stable builds didn’t properly support this GPU architecture.

Working Setup

GPU: NVIDIA GeForce RTX 5050 Laptop GPU
CUDA Toolkit: 12.8
Python: 3.11
PyTorch: Nightly build (CUDA 12.8)

Installation Command

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

Verification

import torch

print("Torch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))

Notes

Nightly builds with cu128 worked correctly and enabled full GPU acceleration.

Hope this helps anyone trying to get newer RTX GPUs working with PyTorch

ptrblck · April 29, 2026, 1:17pm

That’s wrong as all of our binaries using CUDA >= 12.8 support the Blackwell architecture.

Vidusahan_Perera · April 29, 2026, 2:40pm

yes, i updated the mistake.

Fasmanuel2002 · May 6, 2026, 8:58am

Hi Sean_V, you need for this the version of pytorch 2.11.0+cu128, This is because previous version didn’t have this version

Noctropolitan · May 21, 2026, 11:21am

Hello. Sorry to barge in. I feel like a total intruder because I’m not a developer, just a ComfyUI hobby user. I generate stuff to have fun and to illustrate my Sillytavern roleplay stories.

I do have a 5080 too, and I think I tried all the pytorch/cuda versions, and the performance in AI-related stuff is…well, laughable. I use windows.
It eithers forces bf fp16, does rollback to sm_90, and a long etc, I’ve seen people getting benchmarks for image generating using SDXL up to 20 it/s with allegedly the same GPU as me.
I’m getting 2.

So after two months of asking in comfyui communities, updating everthing, testing PCI/energy possible malfunctions, trying the so called “comfyui for blackwell” repository, updating cuda, drivers, python, testing flash/sage attention, and any combination of launch bat flags that I could think of to discard everything, I always end with the conclusion that there are not native wheels for sm_120 kernels for pytorch for windows.

So i literally came here to ask to the source. Am I doing something wrong?

(I tried creating a dual boot with linux but…it wasn’t for me, even with all the drivers, too many errors, things that didn’t work, random pc freezes…)

Is there any hope for simple users like me?

ptrblck · May 21, 2026, 12:49pm

Most likely yes. It’s as simple as installing any build with CUDA 12.8 from our install matrix. If ComfyUI or another application uninstalls these builds, please create issues against their repositories as it’s out of our control if 3rd party libs wipe our binaries.

Noctropolitan · May 21, 2026, 1:57pm

I actually have a flag in my launch bat that forces this in case any custom node or comfy destroys that
\python_standalone\python.exe -m pip install torch==2.10.0+cu128 torchvision==0.25.0+cu128 torchaudio==2.10.0+cu128 --index-url https://download.pytorch.org/whl/cu128

Eacth time i launch comfy, it takes a few seconds longer to boot but I know for certain this way that I have the supposedly best version, but the issue persists. And this happens in any workflow, no matter the model loaded, When I bought my gpu four months ago i tried to generate a 10 sec video with LTX2.3 and it took half an hour.

PCI is ok, power is not limited and it is in high performance mode in my PC, everything that folks on reddit told me to check is check and fine. I even asked gemini and claude, but they think my card just came out and its super new and therefore thare is nothing to fix the issue.

ptrblck · May 21, 2026, 4:23pm

In this case you should already use the correct version and you can check the supported architectures via print(torch.cuda.get_arch_list()).

Assuming sm_120 is shown then the correct kernels are already running. If you are seeing performance issues you might want to profile your code via e.g. nsys to narrow down the bottleneck which could be unrelated to the GPU (e.g. data loading).