Hi, I’m working on an NER model with multi label on one word. So I choose
BCELoss instead of
ignore_index feature, so it will ignore the part where attention_mask is 0.
In BCELoss I only able to do it via manual filter unwanted part like the following code.
loss_fct = torch.nn.BCELoss() # Only keep active parts of the loss if attention_mask is not None: active_logits = logits.view(-1, self.num_labels)[attention_mask.view(-1)== 1] active_labels = labels.view(-1, self.num_labels)[attention_mask.view(-1)== 1] loss = loss_fct(active_logits, active_labels)
Now the model has train successfully. Loss goes down, evaluate goes well.
While I am curious to know that does I manually filter output with attention_mask affect the loss.backword()? And Why?
- the full code is at here
- I see there is a attribute call
grad_fnand the loss output after
index_selectobject. So when I call the loss backword, it will only affect those which index been selected?