Hi, I’m working on an NER model with multi label on one word. So I choose BCELoss
instead of CrossEntropyLoss
.
While CrossEntropyLoss
have ignore_index
feature, so it will ignore the part where attention_mask is 0.
In BCELoss I only able to do it via manual filter unwanted part like the following code.
loss_fct = torch.nn.BCELoss()
# Only keep active parts of the loss
if attention_mask is not None:
active_logits = logits.view(-1, self.num_labels)[attention_mask.view(-1)== 1]
active_labels = labels.view(-1, self.num_labels)[attention_mask.view(-1)== 1]
loss = loss_fct(active_logits, active_labels)
Now the model has train successfully. Loss goes down, evaluate goes well.
While I am curious to know that does I manually filter output with attention_mask affect the loss.backword()? And Why?
Thank you.
edit:
- the full code is at here
- I see there is a attribute call
grad_fn
and the loss output afterloss_fct
, it’sloss.grad_fn
is anindex_select
object. So when I call the loss backword, it will only affect those which index been selected?