RuntimeError: cuda runtime error(59): device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:24

lmolhw · December 18, 2018, 7:39am

Hi, when I running my project, I got this problem.

ptrblck · December 18, 2018, 11:39am

Is the code running fine on CPU?
Usually the error messages on the CPU side are a bit clearer to understand.
If so, try to run your code with CUDA_LAUNCH_BLOCKING=1 python script.py args and post the stack trace again, since CUDA calls are asynchronous and the current stack trace might point to a wrong code location.

Sarah_K · December 19, 2018, 7:19pm

Hi,

I’m getting the same error.
@ptrblck I tried running it the way you suggested but I am still getting the same errors

EncoderDecoderNet(
  (encoder): Encoder(
    (module): Sequential(
      (conv_0): Conv1d(14, 64, kernel_size=(51,), stride=(1,), padding=(25,))
      (cn_0): ChannelNorm()
      (relu_0): ReLU()
      (pool_0): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (conv_1): Conv1d(64, 96, kernel_size=(51,), stride=(1,), padding=(25,))
      (cn_1): ChannelNorm()
      (relu_1): ReLU()
      (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (conv_2): Conv1d(96, 128, kernel_size=(51,), stride=(1,), padding=(25,))
      (cn_2): ChannelNorm()
      (relu_2): ReLU()
      (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    )
  )
  (decoder): Decoder(
    (module): Sequential(
      (up_0): Upsample(scale_factor=2, mode=nearest)
      (conv_0): ConvTranspose1d(128, 96, kernel_size=(51,), stride=(1,), padding=(25,))
      (cn_0): ChannelNorm()
      (relu_0): ReLU()
      (up_1): Upsample(scale_factor=2, mode=nearest)
      (conv_1): ConvTranspose1d(96, 64, kernel_size=(51,), stride=(1,), padding=(25,))
      (cn_1): ChannelNorm()
      (relu_1): ReLU()
      (up_2): Upsample(scale_factor=2, mode=nearest)
      (conv_2): ConvTranspose1d(64, 64, kernel_size=(51,), stride=(1,), padding=(25,))
      (cn_2): ChannelNorm()
      (relu_2): ReLU()
    )
  )
  (fc1): Linear(in_features=64, out_features=32, bias=True)
  (fc2): Linear(in_features=32, out_features=6, bias=True)
)
0
/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:129: UserWarning: nn.Upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name))
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=111 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "tcn_main.py", line 74, in <module>
    main()
  File "tcn_main.py", line 56, in main
    naming)  #8x6
  File "/home/x/RL-Surgical-Gesture-Segmentation-master/tcn_train_test.py", line 239, in cross_validate
    log_dir=log_dir)
  File "/home/x/RL-Surgical-Gesture-Segmentation-master/tcn_train_test.py", line 65, in train_model
    loss = criterion(input=flatten_out, target=gesture)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 904, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1790, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:111
Traceback (most recent call last):

ptrblck · December 19, 2018, 8:27pm

It looks like your target values are not in the expected range of [0, nb_classes-1].
Could you check that and if possible post some code how you’ve created the target indices?

Sarah_K · December 19, 2018, 10:04pm

Thank you for the reply.

Actually it’s not my code, I am trying to reproduce the results of this code

I’d really appreciate the help!

jastern33 · February 21, 2019, 10:42pm

Checking the CPU error message worked for me. It gave me:
Assertioncur_target >= 0 && cur_target < n_classes’ failed.`
I had to make my network output match the number of classes.

ArturoDeza · April 9, 2019, 12:12am

This did the trick, I fixed the number of output classes at the final layer of the network and it’s running now.