Cross-entropy loss ignores labels of a specific index but throws an error

In the definition of torch.nn.module.loss, there is a parameter ignore_index_value that can ignore losses at specific positions. However, when I used -100 to pad labels during training, the following is the padding code I used:

query_inputs["labels"] = torch.cat([torch.full((bs, v_token_length,), -100, device=query_inputs["input_ids"].device), query_inputs["input_ids"]], dim=1)
query_inputs["labels"] = query_inputs["labels"].type(torch.LongTensor)

But during training, I encountered the following error:

nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.

This indicates that the label exceeds the valid range. However, such an error shouldn’t occur because -100 is the default ignore index in PyTorch. How can I resolve this issue?

The default ignore_index=-100 works for me:

batch_size = 16
nb_classes = 32
device = "cuda"

x = torch.randn(batch_size, nb_classes, requires_grad=True, device=device)
target = torch.randint(0, nb_classes, (batch_size,), device=device)

criterion = nn.CrossEntropyLoss()

loss = criterion(x, target)
print(loss)

# use the ignored index
target[0] = -100
loss = criterion(x, target)
print(loss)

# use invalid index
target[1] = -101
loss = criterion(x, target)
# /pytorch/aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
print(loss)
# RuntimeError: CUDA error: device-side assert triggered