Hi, when I running my project, I got this problem.
Is the code running fine on CPU?
Usually the error messages on the CPU side are a bit clearer to understand.
If so, try to run your code with CUDA_LAUNCH_BLOCKING=1 python script.py args
and post the stack trace again, since CUDA calls are asynchronous and the current stack trace might point to a wrong code location.
Hi,
I’m getting the same error.
@ptrblck I tried running it the way you suggested but I am still getting the same errors
EncoderDecoderNet(
(encoder): Encoder(
(module): Sequential(
(conv_0): Conv1d(14, 64, kernel_size=(51,), stride=(1,), padding=(25,))
(cn_0): ChannelNorm()
(relu_0): ReLU()
(pool_0): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv_1): Conv1d(64, 96, kernel_size=(51,), stride=(1,), padding=(25,))
(cn_1): ChannelNorm()
(relu_1): ReLU()
(pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv_2): Conv1d(96, 128, kernel_size=(51,), stride=(1,), padding=(25,))
(cn_2): ChannelNorm()
(relu_2): ReLU()
(pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
)
(decoder): Decoder(
(module): Sequential(
(up_0): Upsample(scale_factor=2, mode=nearest)
(conv_0): ConvTranspose1d(128, 96, kernel_size=(51,), stride=(1,), padding=(25,))
(cn_0): ChannelNorm()
(relu_0): ReLU()
(up_1): Upsample(scale_factor=2, mode=nearest)
(conv_1): ConvTranspose1d(96, 64, kernel_size=(51,), stride=(1,), padding=(25,))
(cn_1): ChannelNorm()
(relu_1): ReLU()
(up_2): Upsample(scale_factor=2, mode=nearest)
(conv_2): ConvTranspose1d(64, 64, kernel_size=(51,), stride=(1,), padding=(25,))
(cn_2): ChannelNorm()
(relu_2): ReLU()
)
)
(fc1): Linear(in_features=64, out_features=32, bias=True)
(fc2): Linear(in_features=32, out_features=6, bias=True)
)
0
/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:129: UserWarning: nn.Upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name))
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=111 error=59 : device-side assert triggered
Traceback (most recent call last):
File "tcn_main.py", line 74, in <module>
main()
File "tcn_main.py", line 56, in main
naming) #8x6
File "/home/x/RL-Surgical-Gesture-Segmentation-master/tcn_train_test.py", line 239, in cross_validate
log_dir=log_dir)
File "/home/x/RL-Surgical-Gesture-Segmentation-master/tcn_train_test.py", line 65, in train_model
loss = criterion(input=flatten_out, target=gesture)
File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 904, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/x/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/x/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1790, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:111
Traceback (most recent call last):
It looks like your target values are not in the expected range of [0, nb_classes-1]
.
Could you check that and if possible post some code how you’ve created the target indices?
Thank you for the reply.
Actually it’s not my code, I am trying to reproduce the results of this code
I’d really appreciate the help!
Checking the CPU error message worked for me. It gave me:
Assertion
cur_target >= 0 && cur_target < n_classes’ failed.`
I had to make my network output match the number of classes.
This did the trick, I fixed the number of output classes at the final layer of the network and it’s running now.