Critical CUDA Kernel Error with PyTorch on NVIDIA RTX 5060 Ti 16GB (VibeVoice TTS)

Subject: :warning: Urgent: Critical CUDA Kernel Error with PyTorch on NVIDIA RTX 5060 Ti 16GB (VibeVoice TTS)


Dear Support Team/Developers,

I am writing to report a critical compatibility issue encountered while attempting to run the VibeVoice model using PyTorch on an NVIDIA RTX 5060 Ti 16GB GPU. The model loads successfully, but immediately fails during the audio generation process with a GPU kernel error, resulting in extremely poor performance (e.g., generating “hello everyone” took approximately 9 minutes).


1. Problem Overview

The core issue is that the PyTorch framework appears unable to fully utilize or execute code specifically compiled for the RTX 5060 Ti’s newer architecture.

Component Detail
Hardware NVIDIA GeForce RTX 5060 Ti 16GB VRAM
Operating System Windows 10/11
Driver GeForce Game Ready Driver (GRD) Version 591.44 (Switched from Studio Driver)
AI Framework PyTorch 2.5.1 (Installed for CUDA 12.1)
AI Model VibeVoice-1.5B (Running via Gradio Interface)

2. The Root Error

Upon initiating generation, the system consistently throws the following fatal CUDA Kernel Mismatch error:

RuntimeError: CUDA error: no kernel image is available for execution on the device

This error directly indicates that the necessary compiled instruction set (kernel code) required to run PyTorch operations on the RTX 5060 Ti is missing or incompatible within the current PyTorch build.

3. Detailed Troubleshooting Steps Taken

To resolve this persistent issue, I performed extensive troubleshooting, including the most critical hardware/software compatibility fixes:

  • Driver Fix: Switched from the Studio Driver (SD) to the latest Game Ready Driver (GRD 591.44).

  • Performance Fix (Code Level): Modified app.py to set inference_steps to 1 for the fastest possible generation speed.

  • PyTorch Reinstallation: Uninstalled and then reinstalled PyTorch, Torchvision, and Torchaudio (pip install ... cu121) after the GRD update, to ensure proper linking with the new CUDA kernels.

  • Environment Overrides: Set the following variables to address kernel allocation and device issues:

    Bash

    set CUDA_LAUNCH_BLOCKING=1
    set TORCH_USE_CUDA_DSA=1
    set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
    
    

Despite these measures, the error persists, confirming a core incompatibility between the current stable PyTorch release and the newer RTX 5060 Ti architecture.

4. Request for Resolution

I urgently request that your team investigate and address the lack of kernel image support for the RTX 5000 series (Blackwell) within the current PyTorch releases.

Could you please provide guidance on:

  1. The correct TORCH_CUDA_ARCH_LIST value required to force compilation for the RTX 5060 Ti?

  2. A link to a PyTorch Nightly build or a patch that explicitly includes support for this new GPU architecture.

Thank you for your time and assistance in resolving this critical barrier to using your software on the latest NVIDIA hardware.


Install any of our PyTorch binaries with CUDA 12.8+ and it will work since Blackwell support was added on CUDA 12.8.