Request: Add CUDA sm_120 (Blackwell) support for ConvNeXtV2 / fused kernels

Hi all,

I’m running into a reproducible CUDA kernel failure on an RTX 5090 (sm_120) when using models that rely on ConvNeXtV2 fused kernels. This appears to be due to missing sm_120 support in the current PyTorch builds.

Environment

  • GPU: NVIDIA RTX 5090 (Blackwell, sm_120)

  • Driver: (NVIDIA Studio Driver 596.36)

  • CUDA Toolkit: (12.4)

  • CUDA Version (13.2)

  • PyTorch: (2.6.0)

  • OS: Windows 11

Error

During inference, any model that uses ConvNeXtV2 or similar fused ops fails with:

Code

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This happens consistently inside the fused ConvNeXtV2 convolution layers:

Code

File ".../convnextv2.py", line 120, in forward
    x = self.forward_features(x)
...
File ".../conv.py", line 549, in _conv_forward
    return F.conv2d(...)

Summary of the issue

  • PyTorch wheels currently support up to sm_90 (Ada).

  • RTX 50xx GPUs require sm_120 kernels.

  • Fused kernels (ConvNeXtV2, some custom ops) cannot fall back to PTX JIT.

  • As a result, models that rely on these ops fail immediately on Blackwell GPUs.

Request

Could the PyTorch team provide guidance or a timeline for:

  1. Official sm_120 support in upcoming PyTorch wheels

  2. Rebuilt fused kernels (ConvNeXtV2 and similar ops) targeting sm_120

  3. Any nightly builds or experimental wheels that include Blackwell support

  4. Whether CUDA 12.8 or later will be required for full compatibility

There are already several users reporting similar issues with RTX 50xx cards, so I wanted to provide a clean repro case and error trace.

Happy to provide additional logs, environment details, or test builds if needed.

Thanks!

Please contact the model authors who developed these custom kernels to update them. PyTorch itself supports all Blackwell GPUs since the 2.7 release.