Runtime error Cuda device side assert

I’m getting the error below, which i’m unable to understand.


I read a similar kind of error, they where taking about out of index bound error.
Can anyone help me on this>

A number of things can trigger a device-side assert error in CUDA, the most common of which being indexing an array out of bounds. Could you provide more context behind what you’re running?

1 Like

I’m trying to train a cnn model. So, when the training reaches the loss.backward(), the error occurs. The same model has worked before. But, I used torchsample to transform the images, which I did’t use previously.
I tried to run without GPU, and I got this error,
RuntimeError: Assertioncur_target >= 0 && cur_target < n_classes’ failed. at /opt/conda/conda-bld/pytorch_1503968623488/work/torch/lib/THNN/generic/ClassNLLCriterion.c:62`

Good idea trying to run on CPU, that error gives a little more context. It sounds like your targets (that you’re passing to NLLLoss or CrossEntropyLoss) are out of bounds? What are your targets and how many classes are there?

Actualy it is a binary classification problem. So, I used a single neuron at the final layer, so the output I got out of the model was [50x1]. This gave rise to the following error at loss calculation,
RuntimeError: Assertioncur_target >= 0 && cur_target < n_classes’ failed. at /opt/conda/conda-bld/pytorch_1503968623488/work/torch/lib/THNN/generic/ClassNLLCriterion.c:62.
Now I squeezed the output so, that it matches the labels size of 50. But, after that, I got another problem, and I changed the output to two neurons, and it worked.
Why can’t i use a single neuron at the end?

What was the problem that you got with a single neuron?

The reason is that NLLCriterion expects the input’s last dimension to match num_classes. (http://pytorch.org/docs/0.2.0/nn.html#torch.nn.NLLLoss)

To get around this, you can

  1. explicitly write out your loss
  2. cat the Pr[0] with Pr[1], which you derive from Pr[0], together.

I got some dimension error. Sorry I couldn’t reproduce the same error. But it works now after doing what SimonW said. Thank you.