Custom loss for huggingface Trainer for sequences

I have a dilemma, for the following custom loss I got this error:

class CustomTrainer(Trainer):
  def compute_loss(self, model, inputs, return_outputs=False):
    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")
    # forward pass
    outputs = model(**inputs)
    logits = outputs.get("logits")

    # logits: (batch_size, sequence_length, num_classes), reshape it to (batch_size*sequence_length, num_classes)
    predictions = logits.view(-1, model.config.num_labels)
    # shape: (batch_size*sequence_length)
    predictions = predictions.argmax(dim=-1)

    labels = inputs.get("labels").to(device)
    labels =  labels.view(-1) 

    labels = torch.tensor(labels, dtype=torch.float)
    predictions = torch.tensor(predictions, dtype=torch.float, requires_grad=True)

    loss = loss_fct(predictions, labels)

    return (loss, outputs) if return_outputs else loss


IndexError: Target 17 is out of bounds.

When I remove these lines of converting dtype:

#labels = torch.tensor(labels, dtype=torch.float)
#predictions = torch.tensor(predictions, dtype=torch.float, requires_grad=True)

I got this error:

RuntimeError: Expected floating point type for target with class
probabilities, got Long

I guess you might be using nn.CrossEntropyLoss as the loss_fct?
If so, note that this criterion accepts model outputs in the shape [batch_size, nb_classes, *] and targets as LongTensors in the shape [batch_size, *] containing class indices in the range [0, nb_classes-1] as well as FloatTensors in the same shape as the model output containing probabilities.

Also, these lines looks wrong:

predictions = predictions.argmax(dim=-1)
predictions = torch.tensor(predictions, dtype=torch.float, requires_grad=True)

since you are detaching the predictions tensor from the computation graph in the argmax operation, which is not differentiable, and it seems you are trying to fix this by re-wrapping the tensor, which will not “re-attach” it to the computation graph.

How to fix that?

I remove this line
predictions = predictions.argmax(dim=-1)

and got this error:

/usr/local/lib/python3.8/dist-packages/torch/nn/ in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
IndexError: Target 17 is out of bounds.

the shapes as the following:

batch_size = 4
seq_length = 128
n_classes = 17

Before logits shape : torch.Size([4, 128, 17])

After predictions shape : torch.Size([512, 17])

Before labels shape : torch.Size([4, 128])
After labels shape : torch.Size([512])

So the input of the CrossEntropyLoss:
predictions shape : torch.Size([512, 17]) : (batch_size * seq_length, n_classes)
labels shape : torch.Size([512]): (batch_size * seq_length,)

@ptrblck as you mentioned the required shape for the CrossEntropyLoss

input, I did the following and still get this error:

IndexError: Target 17 is out of bounds.

predictions =logits.view(batch_size, model.config.num_labels, sequence_length)  # shape: (batch_size, n_classes, seq_length)
labels = inputs.get("labels").to(device) #  (batch_size, seq_length)

Maybe you are already aware of this.
The class labels should be from 0 to n-1. It seems you are using class labels from 1 to n?

I checked the dictionary model.config.label2id I used for both prepare labels for training, and the classes are from [0-n]: [0-16].

You may have to check where the indexing value 17 comes from.

I used the same training data with the default Trainer and it worked without causing this error. I think the default loss is CrossEntropyLoss, so it supposes not to have the target out of bounds.

Also I print the labels inside the compute_loss, and I didn’t see value 17 before the error appear.

@ptrblck can we do the same functionality of predictions = predictions.argmax(dim=-1) without detaching the predictions tensor from the computation graph?

I have encountered the error of

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

if I replace argmax with torch.max it shouldn’t detach the tensor from the graph.
_, predictions = torch.max(logits, dim=2) or even this
_, predictions = torch.max(torch.tensor(logits, requires_grad=True), dim=2)

Any idea? @ptrblck @InnovArul

No, argmax is not differentiable as already mentioned.
torch.max will return the max values (which are still attached to the computation graph) and the argmax (which will be detached as it’s not differentiable).

1 Like