CUDA Error: no kernel image is available for execution on the device

Hello together,

I am quite new to the topic of neural networks.
I`ve build a very simple CNN with the help of YouTube Tutorials and now I have a problem to run my code on the GPU with CUDA. If I set the device to cpu, my code works perfectly well. But if I try to run in on the GPU I get the below displayed error. The GPU ist Nvidia Tesla K20m. The command torch.cuda.is_available() is true. The code is running on Python 3.9. The operating system is RedHat Enterprise Linux (RHEL) / CentOS 7. My input images are 144 x 144 with one channel. The images object that goes into my model is torch.Size([32, 1, 144, 144]). Do you know what could be the problem?

Python Script:

import torch
import torch.nn as nn # All neural network modules, nn.Linear, nn.Conv2d, BatchNorm, Loss functions
import torchvision.transforms as transforms # Transformations we can perform on our dataset
import torchvision
import torch.nn.functional as F
from torch.utils.data import (Dataset, DataLoader) # Gives easier dataset management and creates mini batches
import matplotlib.pyplot as plt
import pandas as pd
from skimage import io
import numpy as np
import os

Class for custom dataset

class SurfaceDataset(Dataset):
def init(self, csv_file, root_dir, transform=None):
self.annotations = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform

def __len__(self):
    return len(self.annotations)

def __getitem__(self, index):
    img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0])
    image = io.imread(img_path)
    y_label = torch.tensor(int(self.annotations.iloc[index, 1]))

    if self.transform:
        image = self.transform(image)

    return (image, y_label)

Set device

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

Print information about the usage of cuda

if torch.cuda.is_available():
print(“CUDA is available”)
print(f"Number of available GPU is {torch.cuda.device_count()}")
else:
print(“CUDA isn’t available”)

Hyperparameters

num_classes = 2
learning_rate = 1e-3
batch_size = 32 # Normally the batch-size should be something of 2^x with x = [0, 1, 2, 3, 4, …]
num_epochs = 2

print("-------------- Hyperparameter Settings --------------")
print(f"Number of classes: {num_classes}")
print(f"Learning rate: {learning_rate}")
print(f"Batch-size: {batch_size}")
print(f"Number of epochs: {num_epochs}")

Load Data

dataset = SurfaceDataset(
csv_file=“Klassifizierung.csv”,
root_dir=“Bilder”,
transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5), (0.5))]))

classes = (‘Defekte Oberfläche’, ‘Defektfreie Oberfläche’)

train_set, test_set = torch.utils.data.random_split(dataset, [800, 100]) # Set the ratio of train and test images
train_loader = DataLoader(dataset=train_set, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_set, batch_size=batch_size, shuffle=False)

def imshow(img):
img = img / 2 + 0.5
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show() # Display all open figures.

get some random training images

dataiter = iter(train_loader)
images, labels = dataiter.next()

show images

imshow(torchvision.utils.make_grid(images)) # Generates one picture that contains several pictures with the make_grid command | make_grid also converts from 1 channel to 3 channels

class ConvNet(nn.Module):
def init(self):
super(ConvNet, self).init()
self.conv1 = nn.Conv2d(1, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 33 * 33, 120)
self.fc2 = nn.Linear(120, 30)
self.fc3 = nn.Linear(30, 2)

def forward(self, x):
    # x -> batch_size, input_channels, width of the image, heigth of the image | batch_size, 1, 144, 144
    x = self.pool(F.relu(self.conv1(x)))  # -> batch_size, 6, 70, 70
    x = self.pool(F.relu(self.conv2(x)))  # -> batch_size, 16, 33, 33
    x = x.view(-1, 16 * 33 * 33)  # -> batch_size, 16 * 33 * 33
    x = F.relu(self.fc1(x))  # -> batch_size, 120
    x = F.relu(self.fc2(x))  # -> batch_size, 84
    x = self.fc3(x)  # -> batch_size, num_classes
    return x

model = ConvNet().to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # import torch.optim for all Optimization algorithms, SGD, Adam, etc.

n_total_steps = len(train_loader) # Total amount of images in the test_set
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):

    images = images.to(device)  # torch.Size([batch_size, 1, 144, 144])
    labels = labels.to(device)  # torch.Size([batch_size])

    # Forward pass
    outputs = model(images)
    loss = criterion(outputs, labels)

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (i + 1) % 1 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{n_total_steps}], Loss: {loss.item():.4f}')

Safe the trained weights of the neural network

print(‘Finished Training’)
PATH = ‘./cnn.pth’
torch.save(model.state_dict(), PATH)

with torch.no_grad():
n_correct = 0
n_samples = 0
n_class_correct = [0 for i in range(num_classes)]
n_class_samples = [0 for i in range(num_classes)]
for images, labels in test_loader:
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs, 1) # torch.max returns (value ,index)
n_samples += labels.size(0)
n_correct += (predicted == labels).sum().item()

    for i in range(len(labels)):
        label = labels[i]
        pred = predicted[i]
        if (label == pred):
            n_class_correct[label] += 1
        n_class_samples[label] += 1

acc = 100.0 * n_correct / n_samples
print(f'Accuracy of the network: {acc} %')

for i in range(num_classes):
    acc = 100.0 * n_class_correct[i] / n_class_samples[i]
    print(f'Accuracy of {classes[i]}: {acc} %')

Conda list:

Name Version Build Channel

_libgcc_mutex 0.1 main
blas 1.0 mkl
blosc 1.21.0 h8c45485_0
brotli 1.0.9 he6710b0_2
brunsli 0.1 h2531618_0
bzip2 1.0.8 h516909a_3 conda-forge
ca-certificates 2021.4.13 h06a4308_1
certifi 2020.12.5 py39h06a4308_0
charls 2.2.0 h2531618_0
cloudpickle 1.6.0 py_0
cudatoolkit 10.2.89 hfd86e86_1
cycler 0.10.0 py39h06a4308_0
cytoolz 0.11.0 py39h27cfd23_0
dask-core 2021.4.0 pyhd3eb1b0_0
decorator 4.4.2 pyhd3eb1b0_0
ffmpeg 4.3 hf484d3e_0 pytorch
freetype 2.10.4 h7ca028e_0 conda-forge
fsspec 0.9.0 pyhd3eb1b0_0
giflib 5.1.4 h14c3975_1
gmp 6.2.1 h58526e2_0 conda-forge
gnutls 3.6.13 h85f3911_1 conda-forge
imagecodecs 2021.3.31 py39h581e88b_0
imageio 2.9.0 pyhd3eb1b0_0
intel-openmp 2021.2.0 h06a4308_610
jpeg 9b h024ee3a_2
jxrlib 1.1 h7b6447c_2
kiwisolver 1.3.1 py39h2531618_0
lame 3.100 h14c3975_1001 conda-forge
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.33.1 h53a641e_7
lerc 2.2.1 h2531618_0
libaec 1.0.4 he6710b0_1
libdeflate 1.7 h27cfd23_5
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libiconv 1.16 h516909a_0 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_1
libuv 1.40.0 h7b6447c_0
libwebp 1.0.1 h8e7db2f_0
libzopfli 1.0.3 he6710b0_0
locket 0.2.1 py39h06a4308_1
lz4-c 1.9.3 h2531618_0
matplotlib-base 3.3.4 py39h62a2d02_0
mkl 2021.2.0 h06a4308_296
mkl-service 2.3.0 py39h27cfd23_1
mkl_fft 1.3.0 py39h42c9631_2
mkl_random 1.2.1 py39ha9443f7_2
ncurses 6.2 he6710b0_1
nettle 3.6 he412f7d_0 conda-forge
networkx 2.5.1 pyhd3eb1b0_0
ninja 1.10.2 hff7bd54_1
numpy 1.20.1 py39h93e21f0_0
numpy-base 1.20.1 py39h7d8b39e_0
olefile 0.46 pyh9f0ad1d_1 conda-forge
openh264 2.1.1 h8b12597_0 conda-forge
openjpeg 2.3.0 h05c96fa_1
openssl 1.1.1k h27cfd23_0
pandas 1.2.4 py39h2531618_0
partd 1.2.0 pyhd3eb1b0_0
pillow 8.2.0 py39he98fc37_0
pip 21.0.1 py39h06a4308_0
pyparsing 2.4.7 pyhd3eb1b0_0
python 3.9.4 hdb3f193_0
python-dateutil 2.8.1 pyhd3eb1b0_0
python_abi 3.9 1_cp39 conda-forge
pytorch 1.8.1 py3.9_cuda10.2_cudnn7.6.5_0 pytorch
pytz 2021.1 pyhd3eb1b0_0
pywavelets 1.1.1 py39h6323ea4_4
pyyaml 5.4.1 py39h27cfd23_1
readline 8.1 h27cfd23_0
scikit-image 0.18.1 py39ha9443f7_0
scipy 1.6.2 py39had2a1c9_1
setuptools 52.0.0 py39h06a4308_0
six 1.15.0 pyh9f0ad1d_0 conda-forge
snappy 1.1.8 he6710b0_0
sqlite 3.35.4 hdfb4753_0
tifffile 2021.3.31 pyhd3eb1b0_1
tk 8.6.10 hbc83047_0
toolz 0.11.1 pyhd3eb1b0_0
torchaudio 0.8.1 py39 pytorch
torchvision 0.9.1 py39_cu102 pytorch
tornado 6.1 py39h27cfd23_0
typing_extensions 3.7.4.3 py_0 conda-forge
tzdata 2020f h52ac0ba_0
wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
zfp 0.5.5 h2531618_6
zlib 1.2.11 h7b6447c_3
zstd 1.4.5 h9ceee32_0

Error message:

Traceback (most recent call last):
File “/utmnt/ut/ft2/cql7772/Test2/OberflaechenKlassifizierung.py”, line 105, in
outputs = model(images)
File “/fibus/fs1/16/cql7772/.conda/envs/CNN-Klassifizierung/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/utmnt/ut/ft2/cql7772/Test2/OberflaechenKlassifizierung.py”, line 84, in forward
x = self.pool(F.relu(self.conv1(x))) # → batch_size, 6, 70, 70
File “/fibus/fs1/16/cql7772/.conda/envs/CNN-Klassifizierung/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/fibus/fs1/16/cql7772/.conda/envs/CNN-Klassifizierung/lib/python3.9/site-packages/torch/nn/modules/conv.py”, line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File “/fibus/fs1/16/cql7772/.conda/envs/CNN-Klassifizierung/lib/python3.9/site-packages/torch/nn/modules/conv.py”, line 395, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: no kernel image is available for execution on the device

The error is raised, if you are trying to execute CUDA code missing the right compute capability for the used device.
The Tesla K20 should have a compute capability of 3.5, which is not shipped in the binaries anymore.
You could build PyTorch from source as described here.

Thanks for your answer. Unfortunatelly thats a pity. But honestly I am
quite confused, because I looked up which Cuda toolkit the Tesla K20m
supports an it showed me that it supports up to 10.2 . As you can see
in my list of packages I have installed cudatoolkit 10.2.89 hfd86e86_1
. So where is actually the problem? Is it Python 3.9? Sorry for the
amount of questions. I try to understand what the root cause of the
problem is.

Zitat von ptrblck via PyTorch Forums noreply@discuss.pytorch.org:

The issue is not caused by CUDA or Python, but by the size of the pip wheels and conda binaries.
Since the pip wheels have an especially hard memory limit, older compute capabilities are removed after some time, which was the case for 3.5. The current CUDA10.2 binaries support compute capabilities 3.7-7.5 and the CUDA11.1 binaries 3.7-8.6.

EDIT: you might also want to check this issue where some users were building binaries for sm_35.

1 Like

Many thanks for your answer. Now its a lot clearer for me. I have also the possibility to run my code on a Tesla K80 GPU. Perhaps the error will disappear since the Tesla K80 has a compute capability of 3.7, what should be fine to run my code.

My python script worked perfectly well on the K80 card. Many thanks again!