When to use ignore_index?

I noticed CrossEntropyLoss now has an ignore_index parameter, but I am not sure I understand what that means:

ignore_index (int, optional): Specifies a target value that is ignored
            and does not contribute to the input gradient. When size_average is
            True, the loss is averaged over non-ignored targets.

First of all, I know that CrossEntropyLoss takes a 1-dimensional array of targets:

Target: :math:`(N)` where each value is `0 <= targets[i] <= C-1`

So then I assume that ignore_index allows you to ignore one of the outputs in the loss calculation. I can imagine it’s useful to mask a whole bunch of outputs. Simply ignoring only one output node,what is the use-case of that?

I probably misunderstood what ignore_index does or when do people use it?


This is used to mask a specific label.
For example, in semantic segmentation, we might have a -1 label that stands for “dont care”, meaning that whatever you predict in that region is not taken into account in the evaluation (because it can be ambiguous).
In this case, you would set the ignore_index to -1, so that those indices are not taken into account.


Thanks for the clarification!


I would like to know more. I am using the torchvision segmentation model from your repo. There, the ignore_index is set to 255.

If using a dataset that multiple classes to ignore during evaluation, say Cityscapes, I am manually using the training IDs and setting all those classes to 0 which is the same as the background class. How can I leverage the ignore_index ?


I am facing issue when using ignore index on cityscape dataset for semantic segmentation. There are 19 classes, and there is one extra ignore class. I used this ignore class as 255 and used same inside cross entropy loss. When I run the model with output number of classes=19, I get assertion error from cross entropy loss,
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [3,0,0], thread: [743,0,0] Assertion t >= 0 && t < n_classes failed.
This error is sort of obvious because we need I am passing 255 label, but considering it is passed in ignore index I am assuming this error should not come. In case this is not how ignore index work can you tell how I can achieve the solution where I want to ignore 255 label and still have 19 classes inside model.

I cannot reproduce this issue with this small code snippet:

x = torch.randn(10, 19, requires_grad=True, device='cuda')
y = torch.randint(0, 19, (10,), device='cuda')
y[0] = 255

criterion = nn.CrossEntropyLoss(ignore_index=255)
loss = criterion(x, y)

Are you sure that the ignored index 255 is causing the issue and not another unexpected target index?

Hey ptr,
I have debugged it and you are right some other label was causing issue. Other label occured because I was using bilinear interpolation instead when I changed it to nearest the other labels were not appearing. Still although this question is not related to pytorch, I was checking if there is any label on numpy array as:
label[label>18] =255
This statement was not working for numpy array and worked in case of torch tensor which is little strange ig. But anyway thanks for the solution.

Good to hear it’s working now!
That’s indeed strange, as numpy should also be able to perform this check:

x = np.random.randint(0, 20, (100))
x[x > 10] = 255

Yah thats strange because it took me time to debug problem because of numpy statement only. Anyway thanks for your time.

What if I want several don’t care classes? for example in a time-series classification, where I have to feed in all the input, but not everything should contribute a gradient, and there are several classes.

In that case you could change the targets to use the same “ignore index” and could then pass this index to the criterion so that it’ll be ignored.