When to use ignore_index?

I noticed CrossEntropyLoss now has an ignore_index parameter, but I am not sure I understand what that means:

ignore_index (int, optional): Specifies a target value that is ignored
            and does not contribute to the input gradient. When size_average is
            True, the loss is averaged over non-ignored targets.

First of all, I know that CrossEntropyLoss takes a 1-dimensional array of targets:

Target: :math:`(N)` where each value is `0 <= targets[i] <= C-1`

So then I assume that ignore_index allows you to ignore one of the outputs in the loss calculation. I can imagine it’s useful to mask a whole bunch of outputs. Simply ignoring only one output node,what is the use-case of that?

I probably misunderstood what ignore_index does or when do people use it?

6 Likes

This is used to mask a specific label.
For example, in semantic segmentation, we might have a -1 label that stands for “dont care”, meaning that whatever you predict in that region is not taken into account in the evaluation (because it can be ambiguous).
In this case, you would set the ignore_index to -1, so that those indices are not taken into account.

13 Likes

Thanks for the clarification!

Hi,

I would like to know more. I am using the torchvision segmentation model from your repo. There, the ignore_index is set to 255.

If using a dataset that multiple classes to ignore during evaluation, say Cityscapes, I am manually using the training IDs and setting all those classes to 0 which is the same as the background class. How can I leverage the ignore_index ?

Thanks

I am facing issue when using ignore index on cityscape dataset for semantic segmentation. There are 19 classes, and there is one extra ignore class. I used this ignore class as 255 and used same inside cross entropy loss. When I run the model with output number of classes=19, I get assertion error from cross entropy loss,
Error:
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [3,0,0], thread: [743,0,0] Assertion t >= 0 && t < n_classes failed.
This error is sort of obvious because we need I am passing 255 label, but considering it is passed in ignore index I am assuming this error should not come. In case this is not how ignore index work can you tell how I can achieve the solution where I want to ignore 255 label and still have 19 classes inside model.

I cannot reproduce this issue with this small code snippet:

x = torch.randn(10, 19, requires_grad=True, device='cuda')
y = torch.randint(0, 19, (10,), device='cuda')
y[0] = 255

criterion = nn.CrossEntropyLoss(ignore_index=255)
loss = criterion(x, y)
print(loss)

Are you sure that the ignored index 255 is causing the issue and not another unexpected target index?

Hey ptr,
I have debugged it and you are right some other label was causing issue. Other label occured because I was using bilinear interpolation instead when I changed it to nearest the other labels were not appearing. Still although this question is not related to pytorch, I was checking if there is any label on numpy array as:
label[label>18] =255
This statement was not working for numpy array and worked in case of torch tensor which is little strange ig. But anyway thanks for the solution.

1 Like

Good to hear it’s working now!
That’s indeed strange, as numpy should also be able to perform this check:

x = np.random.randint(0, 20, (100))
x[x > 10] = 255
print(x)

Yah thats strange because it took me time to debug problem because of numpy statement only. Anyway thanks for your time.

What if I want several don’t care classes? for example in a time-series classification, where I have to feed in all the input, but not everything should contribute a gradient, and there are several classes.

In that case you could change the targets to use the same “ignore index” and could then pass this index to the criterion so that it’ll be ignored.

2 Likes

The ignore_index does not contribute to input gradients, that means the pixels specifies with its value have no role in its output. But it is not the same for testing, right?
In my case training is done really well. But I am getting nan values from the pixels that have been ignored while prediction even after setting ignore_index to the value of the pixels to be Ă­gnored.

Yes, the loss calculation is usually not part of the model inference pipeline and this ignore_index won’t be used. How did you narrow down the invalid values were caused by these pixel locations and why do you include them at all if you want to ignore them always? In case you don’t want to use them at all you could manually mask their values in your eval run.

My dataset is a 40x40x40 set of voxels, with only some of them unoccupied. I am training U-net only for the occupied voxels for two classes. But when the model is in the prediction phase, the unoccupied voxels are also being predicted, but with nan values, which causes the corresponding loss to be nan too. I want to ignore these voxels in the prediction phase to obtain a numeric loss.
The only solution I could find was to use torch.nan_to_num(). What do you suggest to ignore the non-trained unrequired voxels?

It seems you do want to calculate the loss during the validation, in which case ignore_index would still work. My comment explained the common inference use case where no loss calculation is done and only the predictions are used. However, if you are still using nn.CrossEntropyLoss to calculate the loss (and thus have targets) you could still use ignore_index in its calculation.

Loss Function:

loss_fn: torch.nn.Module = nn.CrossEntropyLoss(ignore_index = 2, reduction='mean')

This is my code:

model.eval()
with torch.inference_mode():
        for batch, input_dataset in enumerate(dataloader):
                
            Input = input_dataset[0]
            Target = input_dataset[1]
           
            prediction = model(Input.float())
                
            loss = loss_fn(prediction, Target.long())
            test_loss += loss.item()

I am still getting nan from the pixels that have target value 2. I was expecting no contribution in loss value from these pixels/ voxels.

The loss values for class index 2 should be zero, but if your logits contain the invalid value (e.g. Inf) the F.log_softmax operation would still create NaNs for all other classes.
I don’t know why and how it was working before, but you might need to apply F.log_softmax manually, making sure the other values won’t be poisoned, and then apply nn.NLLLoss with ignore_index.
However, I’m unsure how you want to solve the issue for F.log_softmax as it needs to normalize the log probabilities.

The answer for ‘I don’t know why and how it was working before’: my dataset has a set of 7 features. When training the dataset (32,4,40,40,40) by introducing any feature from 3-7 in the channels, the loss for only testing is nan. However, this is not true when I train the Unet using other features from 0-2. The loss function always has ignore_index = 2, but somehow it doesn’t work with a particular set of features. Instead of predicting a 0 for voxels with 2 target values, it predicts nan values for them in this case.

I will apply the loss manually along with F.log_softmax and see how that works.

One of the outputs:
Epoch: 1 | train_loss: 0.4723 | train_acc: 0.9285 | test_loss: nan | test_acc: 0.9312