RuntimeError: CUDA error: device-side assert triggered at a certain iteration

The code-

def train_model(n_epoch, data):
  best_acc1 = 0
  iter = 0
  for epoch in tqdm(range(n_epoch)):
    for i, (images, labels) in tqdm(enumerate(data['train'])):

      if torch.cuda.is_available():
        images = images.cuda().float()
        labels = labels.cuda()
        images = Variable(images)
        labels = Variable(labels)
      # print('works')
      # Clear gradients w.r.t. parameters
      # Forward pass to get output/logits
      features = Encoder(images)
      features = features.unsqueeze_(1)
      outputs = Decoder(features)
      # Calculate Loss: softmax --> cross entropy loss
      loss = criterion(outputs, labels)
      # Getting gradients w.r.t. parameters
      # Updating parameters
      # print('Optimizer')  
      iter += 1
      if iter % 500 == 0:
        # print("iter")
        # Calculate Accuracy         
        accuracy_v = data_accuracy(data, 'valid')

        is_best = accuracy_v > best_acc1
        best_acc1 = max(accuracy_v, best_acc1)
                'epoch': n_epoch,
                'iter': iter,
                'state_dict_encoder': Encoder.state_dict(),
                'state_dict_decoder': Decoder.state_dict(),
                'best_acc1': best_acc1,
                'optimizer' : optimizer.state_dict(),
            }, is_best)

        # print("test")
        # accuracy_t = data_accuracy(data, 'train')
        # Print Loss
        print('Iteration: {}. Loss: {}. Accuracy {}'.format(iter, loss.item(), accuracy_v))

Error message -

RuntimeError: CUDA error: device-side assert triggered
Exception raised from launch_vectorized_kernel at /pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh:146 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fd6a31681e2 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #1: void at::native::gpu_kernel_impl<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(at::TensorIterator&, c10::Scalar), &at::native::add_kernel_cuda, 4u>, float (float, float), float> >(at::TensorIterator&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(at::TensorIterator&, c10::Scalar), &at::native::add_kernel_cuda, 4u>, float (float, float), float> const&) + 0xe03 (0x7fd6a4f37933 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #2: void at::native::gpu_kernel<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(at::TensorIterator&, c10::Scalar), &at::native::add_kernel_cuda, 4u>, float (float, float), float> >(at::TensorIterator&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(at::TensorIterator&, c10::Scalar), &at::native::add_kernel_cuda, 4u>, float (float, float), float> const&) + 0x11b (0x7fd6a4f3934b in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #3: void at::native::gpu_kernel_with_scalars<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(at::TensorIterator&, c10::Scalar), &at::native::add_kernel_cuda, 4u>, float (float, float), float> >(at::TensorIterator&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(at::TensorIterator&, c10::Scalar), &at::native::add_kernel_cuda, 4u>, float (float, float), float> const&) + 0xeb (0x7fd6a4f395bb in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #4: <unknown function> + 0x192a486 (0x7fd6a4efb486 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #5: at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar) + 0x1a (0x7fd6a4efc1fa in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #6: <unknown function> + 0xbce25e (0x7fd6dad8b25e in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #7: at::native::add_out(at::Tensor&, at::Tensor const&, at::Tensor const&, c10::Scalar) + 0x71 (0x7fd6dad81b61 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #8: <unknown function> + 0xf3b932 (0x7fd6a450c932 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #9: <unknown function> + 0x2e9fad8 (0x7fd6dd05cad8 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #10: <unknown function> + 0x3377258 (0x7fd6dd534258 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #11: torch::autograd::AccumulateGrad::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x38a (0x7fd6dd535aaa in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #12: <unknown function> + 0x3375bb7 (0x7fd6dd532bb7 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #13: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1400 (0x7fd6dd52e400 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #14: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x451 (0x7fd6dd52efa1 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #15: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7fd6dd527119 in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #16: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4a (0x7fd6eacc734a in /usr/local/lib/python3.6/dist-packages/torch/lib/
frame #17: <unknown function> + 0xbd6df (0x7fd7076806df in /usr/lib/x86_64-linux-gnu/
frame #18: <unknown function> + 0x76db (0x7fd7087626db in /lib/x86_64-linux-gnu/
frame #19: clone + 0x3f (0x7fd708a9ba3f in /lib/x86_64-linux-gnu/

My code runs fine till like 366 iteration. But after that this error is shown? Any idea what this is? I can’t debug this cause the code seem to run fine untill a certain iteration.
(I am running it in google colab)

the device assert indicates that you are doing out-of-bounds indexing on one of your Tensors. For example

x = torch.randn(4); # x has indices 0, 1, 2, 3
print(x[4]) # device assert

I am loading the dataset using the default pytorch ImageFolder library. Can’t seem to pinpoint any issues here with the tensors.

def load_data(train_path,valid_path, batch_size, n_iters):
  train_dataset = datasets.ImageFolder(
  valid_dataset = datasets.ImageFolder(

  epoch = n_iters/(len(train_dataset)/batch_size)
  epoch = int(epoch)
  print("Number of epoch {}".format(epoch))

  train_loader =
      batch_size = batch_size,
      shuffle = False

  val_loader =
      batch_size = batch_size,
      shuffle = False

  print('\nTraining data info {}'.format(train_dataset))
  print('\nValid data info {}'.format(valid_dataset))

  # print('\n\nTrain set length {}'.format(train_set))
  # print('\nValidation set length {}'.format(val_set))

  print('\n\nTraining Batch loader size {}'.format(len(train_loader)))
  print('\nValidation Batch loader size {}'.format(len(val_loader)))

  return epoch, train_loader, val_loader

Also the error seems to be during calculating loss. The classes seem to fine according to the dataset.


Could you rerun the code via:


and post the stack trace here?
Alternatively, you could also run the script on the CPU, which should give you a better error message.

IndexError                                Traceback (most recent call last)
<ipython-input-24-9ecd5aa7284f> in <module>()
----> 1 train_model(n_epoch,data_load)

4 frames
<ipython-input-22-ad52cecfd4c0> in train_model(n_epoch, data)
     23       # Calculate Loss: softmax --> cross entropy loss
---> 24       loss = criterion(outputs, labels)
     26       # Getting gradients w.r.t. parameters

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/ in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/ in forward(self, input, target)
    946     def forward(self, input: Tensor, target: Tensor) -> Tensor:
    947         return F.cross_entropy(input, target, weight=self.weight,
--> 948                                ignore_index=self.ignore_index, reduction=self.reduction)

/usr/local/lib/python3.6/dist-packages/torch/nn/ in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2420     if size_average is not None or reduce is not None:
   2421         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)

/usr/local/lib/python3.6/dist-packages/torch/nn/ in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2216                          .format(input.size(0), target.size(0)))
   2217     if dim == 2:
-> 2218         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2219     elif dim == 4:
   2220         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

IndexError: Target 2 is out of bounds.

After running it on CPU.
Also the model worked when I ran it on a sample data by turning the data into tensors without using ImageFolder. Could I be doing something wrong with that?

As @smth suggested, the error is raised by an out-of-bounds indexing.
Based on the stack trace it seems that your model output contains logits (or log probabilities) for 2 classes, while the target uses a class index of 2, which would assume at least 3 classes.

nn.CrossEntropyLoss and nn.NLLLoss expect a model output in the shape [batch_size, nb_classes] and a target in the shape [batch_size] containing class indices in the range [0, nb_classes-1].

Here is a small code snippet to reproduce this error:

batch_size = 2
nb_classes = 2
output = torch.randn(batch_size, nb_classes, requires_grad=True)
target = torch.randint(0, nb_classes, (batch_size,))

criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)

target[0] = 2
loss = criterion(output, target)
> IndexError: Target 2 is out of bounds.

How many classes are created in the ImageFolder and what is the shape of your model output?

1 Like

Solved the issue. It seems my RNN was outputing for 2 classes when I had more than 2 classes.

Thank you for the detailed explanation.