Does loss.backword() work if I filter the tensor manually?

Hi, I’m working on an NER model with multi label on one word. So I choose BCELoss instead of CrossEntropyLoss.

While CrossEntropyLoss have ignore_index feature, so it will ignore the part where attention_mask is 0.

In BCELoss I only able to do it via manual filter unwanted part like the following code.

loss_fct = torch.nn.BCELoss()
# Only keep active parts of the loss
if attention_mask is not None:
	active_logits = logits.view(-1, self.num_labels)[attention_mask.view(-1)== 1]
	active_labels = labels.view(-1, self.num_labels)[attention_mask.view(-1)== 1]
	loss = loss_fct(active_logits, active_labels)

Now the model has train successfully. Loss goes down, evaluate goes well.

While I am curious to know that does I manually filter output with attention_mask affect the loss.backword()? And Why?

Thank you.

edit:

  1. the full code is at here
  2. I see there is a attribute call grad_fn and the loss output after loss_fct, it’s loss.grad_fn is an index_select object. So when I call the loss backword, it will only affect those which index been selected?

Yes, indexing the model output and target should work and the gradient would be backpropagated to the selected values:

output = torch.randn(10, 1, requires_grad=True)
target = torch.randint(0, 2, (10, 1)).float()
criterion = nn.BCEWithLogitsLoss()

batch_idx = torch.tensor([1, 3, 5, 7])
loss = criterion(output[batch_idx], target[batch_idx])
loss.backward()

print(output.grad)
> tensor([[ 0.0000],
          [-0.1686],
          [ 0.0000],
          [ 0.1138],
          [ 0.0000],
          [ 0.2076],
          [ 0.0000],
          [-0.1147],
          [ 0.0000],
          [ 0.0000]])
1 Like

Thanks for your reply! That is clear and easy to understand :grinning: