F.one_hot error: class values must be non-negative

Hi everyone! I’m performing a NER task on a custom dataset using transformers (Roberta-based language model). Due to an imbalanced training set I decided to use the DiceLoss function loss, directly from the official code on github (dice_loss_for_NLP).
My task has 38 labels and the model deals with special tokens (used as sentence separators and padding labels) by setting their labels to -100 “so they are automatically ignored in the default loss function” (cross entropy). This is the part of the dice_loss.py code where I got the “class values must be non-negative” exception:

def _multiple_class(self, input, target, logits_size, mask=None):
        flat_input = input
        flat_target = F.one_hot(target, num_classes=logits_size).float() if self.index_label_position else target.float()

Is there a way to make the one_hot() ignore the -100 labels? I tried to cope with this by applying a mask to the tensor labels, which basically replaces -100 with 0 but I actually have a class 0, wouldn’t that affect the loss computation?

My “key-role” tensors are:

  • labels, the target in the function above → shape: (batch size, num tokens in the batch)
  • logits, the input in the function above → shape: (batch size, num tokens in the batch, num labels)
  • predictions → shape: (batch size, num tokens in the batch)

I can feed the DiceLoss either with logits or predictions.
Sorry if it sounds stupid but I’m new to PyTorch and transformers in general :confused:
Thanks for the help!

hi,
i found an implementation of dice loss here, which has ignore_index option.
but it’s for image segmentation.
you should be able to reshape your model output and target for using it directly.(NxCxW —> NxCxWx1)

I checked the code, and I think it’s wrong.
Don’t use it. Sorry. It is replacing ignore_index with zero.

1 Like

@marcomatta Did you find any solution to the problem?

One hot does not support negative values, but you could convert the negative values to a positive one and ignore them in the loss with the mask.

Something like:

def nonnegative_multiple_class(self, input, target, logits_size):
    sample_row, neg_indices = torch.where(target < 0)
    target[sample_row, neg_indices] = num_classes

    mask = torch.ones_like(input)
    mask[sample_row, neg_indices] = 0

    self._multiple_class(input, target, logits_size, mask=mask)