CUDA runtime error (59)


(Soumyadeep Ghosh) #1

I am receiving this error after different runs of my network…

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/generic/Threshold.cu line=66 error=59 : device-side assert triggered
Traceback (most recent call last):
File “train_finetune_cscrv.py”, line 309, in
train(epoch)
File “train_finetune_cscrv.py”, line 231, in train
classifier_loss.backward() # this loss has to backproped to G as well
File “/home/iab/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/autograd/variable.py”, line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “/home/iab/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/autograd/init.py”, line 98, in backward
variables, grad_variables, retain_graph)
File “/home/iab/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/autograd/function.py”, line 91, in apply
return self._forward_cls.backward(self, *args)
File “/home/iab/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py”, line 187, in backward
return (backward_cls.apply(input, grad_output, ctx.additional_args, ctx._backend, ctx.buffers, *tensor_params) +
File “/home/iab/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py”, line 219, in backward_cls_forward
update_grad_input_fn(ctx._backend.library_state, input, grad_output, grad_input, *gi_args)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/generic/Threshold.cu:66
/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.

There is a separate question on this same thing RuntimeError: cuda runtime error (59)

But I am not sure how to fix this

The next time I run the same code I get this error

File “train_finetune_cscrv.py”, line 309, in
train(epoch)
File “train_finetune_cscrv.py”, line 231, in train
classifier_loss.backward() # this loss has to backproped to G as well
File “/home/iab/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/autograd/variable.py”, line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “/home/iab/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/autograd/init.py”, line 98, in backward
variables, grad_variables, retain_graph)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

PLEASE HELP !!!


(Yazhi Gao) #2

the error already says there is spam in your label data. the criterion only accept data in a range.


(Soumyadeep Ghosh) #3

Ohh yes I fixed it. But the error message was not at all helpful.