RuntimeError: CUDA error: device kernel image is invalid

sicroci · April 2, 2025, 9:47pm

I get the following error in Windows: RuntimeError: CUDA error: device kernel image is invalid

print(torch.__version__)

2.6.0+cu126

nvidia-smi:

Wed Apr  2 23:40:03 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.61                 Driver Version: 572.61         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...  WDDM  |   00000000:E1:00.0 Off |                  N/A |
|  0%   42C    P8              4W /  320W |     360MiB /  16376MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            7696    C+G   ...ll Peripheral Manager\DPM.exe      N/A      |
|    0   N/A  N/A           11288    C+G   C:\Windows\explorer.exe               N/A      |
|    0   N/A  N/A           11352    C+G   ...indows\System32\ShellHost.exe      N/A      |
|    0   N/A  N/A           13408    C+G   ..._cw5n1h2txyewy\SearchHost.exe      N/A      |
|    0   N/A  N/A           13424    C+G   ...y\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A           15524    C+G   ...ouryDevice\asus_framework.exe      N/A      |
|    0   N/A  N/A           17764    C+G   ...gato\CameraHub\Camera Hub.exe      N/A      |
|    0   N/A  N/A           22980    C+G   ...t\Edge\Application\msedge.exe      N/A      |
|    0   N/A  N/A           24416    C+G   ...t\Edge\Application\msedge.exe      N/A      |
+-----------------------------------------------------------------------------------------+

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:42:46_Pacific_Standard_Time_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

ptrblck · April 2, 2025, 10:20pm

Could you either post a minimal and executable code snippet reproducing the issue or let us know which kernel raises the error?

sicroci · April 3, 2025, 8:05am

This is another code that generates the same error:

import torch
device = torch.device("cuda")
torch.rand(10).to(device)

This is the complete error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Thanos\miniconda3\lib\site-packages\torch\_tensor.py", line 568, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "C:\Users\Thanos\miniconda3\lib\site-packages\torch\_tensor_str.py", line 704, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "C:\Users\Thanos\miniconda3\lib\site-packages\torch\_tensor_str.py", line 621, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "C:\Users\Thanos\miniconda3\lib\site-packages\torch\_tensor_str.py", line 353, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "C:\Users\Thanos\miniconda3\lib\site-packages\torch\_tensor_str.py", line 146, in __init__
    tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0)
RuntimeError: CUDA error: device kernel image is invalid
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I have uninstalled and reinstalled all NVIDIA software including the drivers in Windows 11, and I get the same problem.

Here more information:

>>> import torch
>>> print(torch.version.cuda)
12.6
>>> print(torch.cuda.is_available())
True
>>> print(torch.cuda.get_device_name(0))  # Get GPU name
NVIDIA GeForce RTX 4080 SUPER

Output of nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Fri_Jun_14_16:44:19_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.20
Build cuda_12.6.r12.6/compiler.34431801_0

Output of nvidia-smi

Thu Apr  3 19:53:37 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.83                 Driver Version: 572.83         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...  WDDM  |   00000000:E1:00.0 Off |                  N/A |
|  0%   42C    P5             20W /  320W |     252MiB /  16376MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            8460    C+G   ...ouryDevice\asus_framework.exe      N/A      |
|    0   N/A  N/A           12136    C+G   ...ll Peripheral Manager\DPM.exe      N/A      |
|    0   N/A  N/A           14012    C+G   C:\Windows\explorer.exe               N/A      |
|    0   N/A  N/A           14020    C+G   ...indows\System32\ShellHost.exe      N/A      |
|    0   N/A  N/A           14964    C+G   ..._cw5n1h2txyewy\SearchHost.exe      N/A      |
|    0   N/A  N/A           14988    C+G   ...y\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A           20576    C+G   ...gato\CameraHub\Camera Hub.exe      N/A      |
+-----------------------------------------------------------------------------------------+

sicroci · April 4, 2025, 5:58am

I have solved the problem by uninstalling and then installing all NVIDIA software (drivers, cuda, …) and this time also PyTorch.