Does loss.backword() work if I filter the tensor manually?

EasonC13 · July 1, 2021, 11:07pm

Hi, I’m working on an NER model with multi label on one word. So I choose BCELoss instead of CrossEntropyLoss.

While CrossEntropyLoss have ignore_index feature, so it will ignore the part where attention_mask is 0.

In BCELoss I only able to do it via manual filter unwanted part like the following code.

loss_fct = torch.nn.BCELoss()
# Only keep active parts of the loss
if attention_mask is not None:
	active_logits = logits.view(-1, self.num_labels)[attention_mask.view(-1)== 1]
	active_labels = labels.view(-1, self.num_labels)[attention_mask.view(-1)== 1]
	loss = loss_fct(active_logits, active_labels)

Now the model has train successfully. Loss goes down, evaluate goes well.

While I am curious to know that does I manually filter output with attention_mask affect the loss.backword()? And Why?

Thank you.

edit:

the full code is at here
I see there is a attribute call grad_fn and the loss output after loss_fct, it’s loss.grad_fn is an index_select object. So when I call the loss backword, it will only affect those which index been selected?

ptrblck · July 2, 2021, 2:36am

Yes, indexing the model output and target should work and the gradient would be backpropagated to the selected values:

output = torch.randn(10, 1, requires_grad=True)
target = torch.randint(0, 2, (10, 1)).float()
criterion = nn.BCEWithLogitsLoss()

batch_idx = torch.tensor([1, 3, 5, 7])
loss = criterion(output[batch_idx], target[batch_idx])
loss.backward()

print(output.grad)
> tensor([[ 0.0000],
          [-0.1686],
          [ 0.0000],
          [ 0.1138],
          [ 0.0000],
          [ 0.2076],
          [ 0.0000],
          [-0.1147],
          [ 0.0000],
          [ 0.0000]])

EasonC13 · July 2, 2021, 12:32pm

Thanks for your reply! That is clear and easy to understand