For anyone running CSM-1B (Sesame’s TTS model) on RTX 5090 — I got it working at 0.46x RTF with CUDA graph replay using nightly cu128. Had to patch HF Transformers in 4 places: StaticCache index_copy_ → slice assignment, 3x arange fixes in modeling_csm.py, and cudagraph_mark_step_begin calls in the generate loop. Full pipeline + auto-patcher here: https://github.com/D3velop-llc/csm-rtx5090
this worked on my rtx 5050 too. Thank you very much.
RTX 5050 + CUDA 12.8 + PyTorch (Working Setup)
Hi everyone,
I recently set up PyTorch on a machine with an NVIDIA RTX 5050 Laptop GPU, and wanted to share a working configuration since older/stable builds didn’t properly support this GPU architecture.
Working Setup
-
GPU: NVIDIA GeForce RTX 5050 Laptop GPU
-
CUDA Toolkit: 12.8
-
Python: 3.11
-
PyTorch: Nightly build (CUDA 12.8)
Installation Command
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
Verification
import torch
print("Torch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
print("GPU:", torch.cuda.get_device_name(0))
Notes
- Nightly builds with cu128 worked correctly and enabled full GPU acceleration.
Hope this helps anyone trying to get newer RTX GPUs working with PyTorch ![]()
That’s wrong as all of our binaries using CUDA >= 12.8 support the Blackwell architecture.
yes, i updated the mistake.
Hi Sean_V, you need for this the version of pytorch 2.11.0+cu128, This is because previous version didn’t have this version
Hello. Sorry to barge in. I feel like a total intruder because I’m not a developer, just a ComfyUI hobby user. I generate stuff to have fun and to illustrate my Sillytavern roleplay stories.
I do have a 5080 too, and I think I tried all the pytorch/cuda versions, and the performance in AI-related stuff is…well, laughable. I use windows.
It eithers forces bf fp16, does rollback to sm_90, and a long etc, I’ve seen people getting benchmarks for image generating using SDXL up to 20 it/s with allegedly the same GPU as me.
I’m getting 2.
So after two months of asking in comfyui communities, updating everthing, testing PCI/energy possible malfunctions, trying the so called “comfyui for blackwell” repository, updating cuda, drivers, python, testing flash/sage attention, and any combination of launch bat flags that I could think of to discard everything, I always end with the conclusion that there are not native wheels for sm_120 kernels for pytorch for windows.
So i literally came here to ask to the source. Am I doing something wrong?
(I tried creating a dual boot with linux but…it wasn’t for me, even with all the drivers, too many errors, things that didn’t work, random pc freezes…)
Is there any hope for simple users like me?
Most likely yes. It’s as simple as installing any build with CUDA 12.8 from our install matrix. If ComfyUI or another application uninstalls these builds, please create issues against their repositories as it’s out of our control if 3rd party libs wipe our binaries.
I actually have a flag in my launch bat that forces this in case any custom node or comfy destroys that
\python_standalone\python.exe -m pip install torch==2.10.0+cu128 torchvision==0.25.0+cu128 torchaudio==2.10.0+cu128 --index-url https://download.pytorch.org/whl/cu128
Eacth time i launch comfy, it takes a few seconds longer to boot but I know for certain this way that I have the supposedly best version, but the issue persists. And this happens in any workflow, no matter the model loaded, When I bought my gpu four months ago i tried to generate a 10 sec video with LTX2.3 and it took half an hour.
PCI is ok, power is not limited and it is in high performance mode in my PC, everything that folks on reddit told me to check is check and fine. I even asked gemini and claude, but they think my card just came out and its super new and therefore thare is nothing to fix the issue.
In this case you should already use the correct version and you can check the supported architectures via print(torch.cuda.get_arch_list()).
Assuming sm_120 is shown then the correct kernels are already running. If you are seeing performance issues you might want to profile your code via e.g. nsys to narrow down the bottleneck which could be unrelated to the GPU (e.g. data loading).