Having a cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Dataset CS - Label range: [0, 19]
Dataset IDD20K - Label range: [0, 19]
Dataset CS - Label range: [0, 19]
Dataset IDD20K - Label range: [0, 19]
Dataset CS - Label range: [0, 1073741828]
Traceback (most recent call last):
File “segment.py”, line 661, in
main(args, get_dataset)
File “segment.py”, line 608, in main
model = train(args, get_dataset, model, False) #Train
File “segment.py”, line 279, in train
scaler.scale(loss_s).backward()
File “/home/amax/anaconda3/envs/yyl/lib/python3.8/site-packages/torch/_tensor.py”, line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/home/amax/anaconda3/envs/yyl/lib/python3.8/site-packages/torch/autograd/init.py”, line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
You can try to repro this exception using the following code snippet. If that doesn’t trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([6, 128, 64, 128], dtype=torch.half, device=‘cuda’, requires_grad=True)
net = torch.nn.Conv2d(128, 128, kernel_size=[1, 3], padding=[0, 2], stride=[1, 1], dilation=[1, 2], groups=1)
net = net.cuda().half()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
memory_format = Contiguous
data_type = CUDNN_DATA_HALF
padding = [0, 2, 0]
stride = [1, 1, 0]
dilation = [1, 2, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x7f9fed396640
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 6, 128, 64, 128,
strideA = 1048576, 8192, 128, 1,
output: TensorDescriptor 0x7f9fed264960
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 6, 128, 64, 128,
strideA = 1048576, 8192, 128, 1,
weight: FilterDescriptor 0x7f9fed35e5c0
type = CUDNN_DATA_HALF
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 128, 128, 1, 3,
Pointer addresses:
input: 0x7fa03c800000
output: 0x7fa0a5800000
weight: 0x7f9fe59e2800
Additional pointer addresses:
grad_output: 0x7fa0a5800000
grad_weight: 0x7f9fe59e2800
Backward filter algorithm: 1

…/aten/src/ATen/native/cuda/NLLLoss2d.cu:103: nll_loss2d_forward_kernel: block: [17,0,0], thread: [962,0,0] Assertion t >= 0 && t < n_classes failed.

cuDNN is just running into a sticky indexing error the loss calculation:

…/aten/src/ATen/native/cuda/NLLLoss2d.cu:103: nll_loss2d_forward_kernel: block: [17,0,0], thread: [962,0,0] Assertion t >= 0 && t < n_classes failed.

Check the target and make sure its containing class indices in the range [0, nb_classes-1].

I thought it was an error at first, but I didn’t find it when I checked the mask. What’s more, the last one is Dataset CS - Label range: [0, 19]
Dataset IDD20K - Label range: [0, 19]
Dataset CS - Label range: [0, 1073741828]