F.one_hot error: class values must be non-negative

Hi everyone! I’m performing a NER task on a custom dataset using transformers (Roberta-based language model). Due to an imbalanced training set I decided to use the DiceLoss function loss, directly from the official code on github (dice_loss_for_NLP).
My task has 38 labels and the model deals with special tokens (used as sentence separators and padding labels) by setting their labels to -100 “so they are automatically ignored in the default loss function” (cross entropy). This is the part of the dice_loss.py code where I got the “class values must be non-negative” exception:

def _multiple_class(self, input, target, logits_size, mask=None):
        flat_input = input
        flat_target = F.one_hot(target, num_classes=logits_size).float() if self.index_label_position else target.float()

Is there a way to make the one_hot() ignore the -100 labels? I tried to cope with this by applying a mask to the tensor labels, which basically replaces -100 with 0 but I actually have a class 0, wouldn’t that affect the loss computation?

My “key-role” tensors are:

  • labels, the target in the function above → shape: (batch size, num tokens in the batch)
  • logits, the input in the function above → shape: (batch size, num tokens in the batch, num labels)
  • predictions → shape: (batch size, num tokens in the batch)

I can feed the DiceLoss either with logits or predictions.
Sorry if it sounds stupid but I’m new to PyTorch and transformers in general :confused:
Thanks for the help!

hi,
i found an implementation of dice loss here, which has ignore_index option.
but it’s for image segmentation.
you should be able to reshape your model output and target for using it directly.(NxCxW —> NxCxWx1)

I checked the code, and I think it’s wrong.
Don’t use it. Sorry. It is replacing ignore_index with zero.

1 Like