Hi everyone! I’m performing a NER task on a custom dataset using transformers (Roberta-based language model). Due to an imbalanced training set I decided to use the DiceLoss function loss, directly from the official code on github (dice_loss_for_NLP).
My task has 38 labels and the model deals with special tokens (used as sentence separators and padding labels) by setting their labels to -100 “so they are automatically ignored in the default loss function” (cross entropy). This is the part of the dice_loss.py code where I got the “class values must be non-negative” exception:
def _multiple_class(self, input, target, logits_size, mask=None):
flat_input = input
flat_target = F.one_hot(target, num_classes=logits_size).float() if self.index_label_position else target.float()
Is there a way to make the one_hot() ignore the -100 labels? I tried to cope with this by applying a mask to the tensor labels, which basically replaces -100 with 0 but I actually have a class 0, wouldn’t that affect the loss computation?
My “key-role” tensors are:
- labels, the target in the function above → shape: (batch size, num tokens in the batch)
- logits, the input in the function above → shape: (batch size, num tokens in the batch, num labels)
- predictions → shape: (batch size, num tokens in the batch)
I can feed the DiceLoss either with logits or predictions.
Sorry if it sounds stupid but I’m new to PyTorch and transformers in general
Thanks for the help!