Runtime error Cuda device side assert

sManohar201 · November 2, 2017, 9:41am

I’m getting the error below, which i’m unable to understand.

I read a similar kind of error, they where taking about out of index bound error.
Can anyone help me on this>

richard · November 2, 2017, 2:44pm

A number of things can trigger a device-side assert error in CUDA, the most common of which being indexing an array out of bounds. Could you provide more context behind what you’re running?

sManohar201 · November 2, 2017, 3:42pm

I’m trying to train a cnn model. So, when the training reaches the loss.backward(), the error occurs. The same model has worked before. But, I used torchsample to transform the images, which I did’t use previously.
I tried to run without GPU, and I got this error,
RuntimeError: Assertioncur_target >= 0 && cur_target < n_classes’ failed. at /opt/conda/conda-bld/pytorch_1503968623488/work/torch/lib/THNN/generic/ClassNLLCriterion.c:62`

richard · November 2, 2017, 4:11pm

Good idea trying to run on CPU, that error gives a little more context. It sounds like your targets (that you’re passing to NLLLoss or CrossEntropyLoss) are out of bounds? What are your targets and how many classes are there?

sManohar201 · November 2, 2017, 4:23pm

Actualy it is a binary classification problem. So, I used a single neuron at the final layer, so the output I got out of the model was [50x1]. This gave rise to the following error at loss calculation,
RuntimeError: Assertioncur_target >= 0 && cur_target < n_classes’ failed. at /opt/conda/conda-bld/pytorch_1503968623488/work/torch/lib/THNN/generic/ClassNLLCriterion.c:62.
Now I squeezed the output so, that it matches the labels size of 50. But, after that, I got another problem, and I changed the output to two neurons, and it worked.
Why can’t i use a single neuron at the end?

richard · November 2, 2017, 5:08pm

What was the problem that you got with a single neuron?

SimonW · November 2, 2017, 5:44pm

The reason is that NLLCriterion expects the input’s last dimension to match num_classes. (http://pytorch.org/docs/0.2.0/nn.html#torch.nn.NLLLoss)

To get around this, you can

explicitly write out your loss
cat the Pr[0] with Pr[1], which you derive from Pr[0], together.

sManohar201 · November 2, 2017, 9:44pm

I got some dimension error. Sorry I couldn’t reproduce the same error. But it works now after doing what SimonW said. Thank you.