In the definition of torch.nn.module.loss
, there is a parameter ignore_index_value
that can ignore losses at specific positions. However, when I used -100
to pad labels during training, the following is the padding code I used:
query_inputs["labels"] = torch.cat([torch.full((bs, v_token_length,), -100, device=query_inputs["input_ids"].device), query_inputs["input_ids"]], dim=1)
query_inputs["labels"] = query_inputs["labels"].type(torch.LongTensor)
But during training, I encountered the following error:
nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
This indicates that the label exceeds the valid range. However, such an error shouldn’t occur because -100
is the default ignore index in PyTorch. How can I resolve this issue?