RuntimeError : device-side assert triggered

Hi, I am on a Linux server and the GPU is GeForce RTX3090. I use transfer learning to train my model. I use the model resnet18 for my purpose. Since I have 2 classes to classify, I use 2 output units for my final layer. But when I start the training process it goes to a runtime error. Please help me resolve this.

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Following is the code.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy

mean = np.array([0.5, 0.5, 0.5])
std = np.array([0.25, 0.25, 0.25])

data_transforms = {

'train': transforms.Compose([
    transforms.Normalize(mean, std)


'val': transforms.Compose([
    transforms.Normalize(mean, std)


data_dir = ‘//home//CAMA_cat//trainP1//’

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
for x in [‘train’, ‘val’]}

dataloaders = {x:[x], batch_size=2,
shuffle=True, num_workers=0)
for x in [‘train’, ‘val’]}

dataset_sizes = {x: len(image_datasets[x]) for x in [‘train’, ‘val’]}
class_names = image_datasets[‘train’].classes

device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)

def imshow(inp, title):
inp = inp.numpy().transpose((1, 2, 0))
inp = std * inp + mean
inp = np.clip(inp, 0, 1)

inputs, classes = next(iter(dataloaders[‘train’]))
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print(‘Epoch {}/{}’.format(epoch, num_epochs - 1))
print(‘-’ * 10)

    # Each epoch has a training and validation phase

    for phase in ['train', 'val']:
        if phase == 'train':
            model.train()  # Set model to training mode
            model.eval()   # Set model to evaluate mode
        running_loss = 0.0
        running_corrects = 0

        # Iterate over data.
        for inputs, labels in dataloaders[phase]:
            inputs =
            labels =

            # forward

            # track history if only in train

            with torch.set_grad_enabled(phase == 'train'):
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)

                # backward + optimize only if in training phase

                if phase == 'train':

            # statistics

            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds ==

        if phase == 'train':

        epoch_loss = running_loss / dataset_sizes[phase]
        epoch_acc = running_corrects.double() / dataset_sizes[phase]

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(
            phase, epoch_loss, epoch_acc))

        # deep copy the model

        if phase == 'val' and epoch_acc > best_acc:
            best_acc = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())


time_elapsed = time.time() - since

print('Training complete in {:.0f}m {:.0f}s'.format(
    time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

# load best model weights

return model

model = models.resnet18(pretrained=True)

num_ftrs = model.fc.in_features


model.fc = nn.Linear(num_ftrs, 2)
model =
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)
step_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
model = train_model(model, criterion, optimizer, step_lr_scheduler, num_epochs=25)

Rerun your code via CUDA_LAUNCH_BLOCKING=1 python args as suggested in the error message and check which operation fails.
Device-side asserts are triggered e.g. by invalid indexing operations etc.

Thank you so much for the quick response. I tried, but it says “No module named torch” when I run in the terminal. But PyTorch is installed and the chosen python environment is also correct. I used “pip show torch”

This would indicate that a different Python environment is used in your terminal, so make sure to use the same one as in your original setup.

Thanks. Managed to overcome the problem. I rerun the code using CUDA_LAUNCH_BLOCKING=1 python args then found this error:
ValueError: Using a target size (torch.size([2])) that is different to the input size (torch.Size([2,2])) is deprecated. Please ensure they have the same size.