Why does ignore_index ignore the entire example and not the class?

Andrew_Holmes · March 29, 2024, 11:25pm

I was looking into the backward pass of the CrossEntropyLoss class and noticed that the ignore_index just zeros an entire row (example) instead of the column (classes) for the gradient. I’m wondering if this is intended behavior? To me it seems that the class that we don’t want to compute the gradient for should just have its column set to zero while leaving the others unaffected. But it seems the row that’s ignore_index is being ignored, and every single other class with it. My guess is, well you don’t want to compute the gradients for an example containing the class you want to ignore, thus you ignore the entire example. The thing is, don’t those gradients from the other classes matter still, that we’rent ignore_index?

import torch
import torch.nn.functional as torchf

  ignore_index = 0
  x = torch.arange(12).reshape(4, 3).float().requires_grad_()  # random data
  y = torch.tensor([1, 2, 0, 1])  # 3 classes
  loss = torchf.cross_entropy(x, y, ignore_index=ignore_index)
  print(loss)
  loss.backward()
  print(x.grad)

python3 tests.py
tensor(1.0743, grad_fn=<NllLossBackward0>)
tensor([[ 0.0300, -0.2518,  0.2217],
        [ 0.0300,  0.0816, -0.1116],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0300, -0.2518,  0.2217]])

Why not?

tensor([[ 0.0000, -0.2518,  0.2217],
        [  0.0000,  0.0816, -0.1116],
        [ 0.0000,  0.abcd,  0.wxyz],
        [ 0.0000, -0.2518,  0.2217]])

abcd and wxyz are just place holders for whatever those values would have been if they we’rent zeroed.

srishti-git1110 · March 30, 2024, 4:20pm

Hi Andrew,
Please note that ignore_index is used to ignore the target class that we don’t want to contribute towards the gradient calculation. In your case, since you want to ignore class 0, the input gradients of the 3rd example in the batch became 0 as the 3rd example has class 0 as the ground truth.