Subject:
Urgent: Critical CUDA Kernel Error with PyTorch on NVIDIA RTX 5060 Ti 16GB (VibeVoice TTS)
Dear Support Team/Developers,
I am writing to report a critical compatibility issue encountered while attempting to run the VibeVoice model using PyTorch on an NVIDIA RTX 5060 Ti 16GB GPU. The model loads successfully, but immediately fails during the audio generation process with a GPU kernel error, resulting in extremely poor performance (e.g., generating “hello everyone” took approximately 9 minutes).
1. Problem Overview
The core issue is that the PyTorch framework appears unable to fully utilize or execute code specifically compiled for the RTX 5060 Ti’s newer architecture.
| Component | Detail |
|---|---|
| Hardware | NVIDIA GeForce RTX 5060 Ti 16GB VRAM |
| Operating System | Windows 10/11 |
| Driver | GeForce Game Ready Driver (GRD) Version 591.44 (Switched from Studio Driver) |
| AI Framework | PyTorch 2.5.1 (Installed for CUDA 12.1) |
| AI Model | VibeVoice-1.5B (Running via Gradio Interface) |
2. The Root Error
Upon initiating generation, the system consistently throws the following fatal CUDA Kernel Mismatch error:
RuntimeError: CUDA error: no kernel image is available for execution on the device
This error directly indicates that the necessary compiled instruction set (kernel code) required to run PyTorch operations on the RTX 5060 Ti is missing or incompatible within the current PyTorch build.
3. Detailed Troubleshooting Steps Taken
To resolve this persistent issue, I performed extensive troubleshooting, including the most critical hardware/software compatibility fixes:
-
Driver Fix: Switched from the Studio Driver (SD) to the latest Game Ready Driver (GRD 591.44).
-
Performance Fix (Code Level): Modified
app.pyto setinference_stepsto 1 for the fastest possible generation speed. -
PyTorch Reinstallation: Uninstalled and then reinstalled PyTorch, Torchvision, and Torchaudio (
pip install ... cu121) after the GRD update, to ensure proper linking with the new CUDA kernels. -
Environment Overrides: Set the following variables to address kernel allocation and device issues:
Bash
set CUDA_LAUNCH_BLOCKING=1 set TORCH_USE_CUDA_DSA=1 set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
Despite these measures, the error persists, confirming a core incompatibility between the current stable PyTorch release and the newer RTX 5060 Ti architecture.
4. Request for Resolution
I urgently request that your team investigate and address the lack of kernel image support for the RTX 5000 series (Blackwell) within the current PyTorch releases.
Could you please provide guidance on:
-
The correct
TORCH_CUDA_ARCH_LISTvalue required to force compilation for the RTX 5060 Ti? -
A link to a PyTorch Nightly build or a patch that explicitly includes support for this new GPU architecture.
Thank you for your time and assistance in resolving this critical barrier to using your software on the latest NVIDIA hardware.