Ed_Moman
(Ed Moman)
October 10, 2023, 2:50pm
1
Hello,
I would like to create a custom loss function that takes into consideration only one of the labels.
I have three labels (0, 1, 2) and I would like to consider only 0.
This is my code:
class CustomTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.pop("labels")
outputs = model(**inputs)
logits = outputs.get("logits")
label_tensor = labels.view(-1)
output_tensor = logits.view(-1, self.model.config.num_labels)
mask = label_tensor.eq(0)
loss = torch.mean(torch.abs(torch.masked_select(output_tensor, mask) - torch.masked_select(label_tensor, mask)))
return (loss, outputs) if return_outputs else loss
This is not working.
Any ideas?
Ed_Moman
(Ed Moman)
October 11, 2023, 12:09pm
3
The following works (I do not know whether it makes sense, but at least there are no errors):
class CustomTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.pop("labels")
outputs = model(**inputs)
logits = outputs.get("logits")
label_tensor = labels.view(-1)
output_tensor = torch.argmax(logits, dim=-1)
mask = label_tensor.eq(0)
loss = (torch.masked_select(output_tensor, mask) - torch.masked_select(label_tensor, mask)).float().abs().mean()
loss.requires_grad = True
return (loss, outputs) if return_outputs else loss
Thanks a million!
The code looks wrong since it seems you are detaching the logits
tensor by calling torch.argmax
on it, which will raise a valid error explaining that backward()
cannot be called on the loss tensor. It then seems you are trying to fix it by explicitly calling loss.requires_grad = True
which will not re-attach the computation graph somehow and only masks the error.
The model parameters will thus not get any valid gradients.
Ed_Moman
(Ed Moman)
October 12, 2023, 6:57am
5
Thank you. So perhaps the entire approach does not make sense. Or is there a workaround?
What I am doing right now is using weighted Cross Entropy Loss:
class_weights = compute_class_weight(class_weight='balanced', classes=np.unique(df['labels']), y=df['labels'])
c_w = np.fromiter((np.ceil(i/min(class_weights)) for i in class_weights), dtype=np.float32)
c_w[np.argmax(c_w)] = np.max(c_w) + 1
class CustomTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.pop("labels")
# forward pass
outputs = model(**inputs)
logits = outputs.get("logits")
# compute custom loss (suppose one has 3 labels with different weights)
loss_fct = nn.CrossEntropyLoss(weight=torch.tensor(c_w, device=model.device))
loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
return (loss, outputs) if return_outputs else loss
But this is sill fitting primarily the majority class.
However, I am interested in one of the minority classes.
If I increase the weight of the target class above a certain threshold the models just breaks.
Ed_Moman
(Ed Moman)
October 12, 2023, 9:38am
6
I am trying upsampling…
Ed_Moman
(Ed Moman)
October 12, 2023, 12:28pm
7
Upsampling seems to do the trick in this case. The training is much more stable. Then we will see how well the model generalises.
My guess, I do not know if this makes any sense, is that the batch size for evaluation was too small to properly sample the minority classes. But I have very limited VRAM.