I noticed CrossEntropyLoss now has an ignore_index parameter, but I am not sure I understand what that means:
ignore_index (int, optional): Specifies a target value that is ignored
and does not contribute to the input gradient. When size_average is
True, the loss is averaged over non-ignored targets.
First of all, I know that CrossEntropyLoss takes a 1-dimensional array of targets:
Target: :math:`(N)` where each value is `0 <= targets[i] <= C-1`
So then I assume that ignore_index allows you to ignore one of the outputs in the loss calculation. I can imagine itâs useful to mask a whole bunch of outputs. Simply ignoring only one output node,what is the use-case of that?
I probably misunderstood what ignore_index does or when do people use it?
This is used to mask a specific label.
For example, in semantic segmentation, we might have a -1 label that stands for âdont careâ, meaning that whatever you predict in that region is not taken into account in the evaluation (because it can be ambiguous).
In this case, you would set the ignore_index to -1, so that those indices are not taken into account.
I would like to know more. I am using the torchvision segmentation model from your repo. There, the ignore_index is set to 255.
If using a dataset that multiple classes to ignore during evaluation, say Cityscapes, I am manually using the training IDs and setting all those classes to 0 which is the same as the background class. How can I leverage the ignore_index ?
I am facing issue when using ignore index on cityscape dataset for semantic segmentation. There are 19 classes, and there is one extra ignore class. I used this ignore class as 255 and used same inside cross entropy loss. When I run the model with output number of classes=19, I get assertion error from cross entropy loss,
Error:
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [3,0,0], thread: [743,0,0] Assertion t >= 0 && t < n_classes failed.
This error is sort of obvious because we need I am passing 255 label, but considering it is passed in ignore index I am assuming this error should not come. In case this is not how ignore index work can you tell how I can achieve the solution where I want to ignore 255 label and still have 19 classes inside model.
Hey ptr,
I have debugged it and you are right some other label was causing issue. Other label occured because I was using bilinear interpolation instead when I changed it to nearest the other labels were not appearing. Still although this question is not related to pytorch, I was checking if there is any label on numpy array as:
label[label>18] =255
This statement was not working for numpy array and worked in case of torch tensor which is little strange ig. But anyway thanks for the solution.
What if I want several donât care classes? for example in a time-series classification, where I have to feed in all the input, but not everything should contribute a gradient, and there are several classes.
In that case you could change the targets to use the same âignore indexâ and could then pass this index to the criterion so that itâll be ignored.
The ignore_index does not contribute to input gradients, that means the pixels specifies with its value have no role in its output. But it is not the same for testing, right?
In my case training is done really well. But I am getting nan values from the pixels that have been ignored while prediction even after setting ignore_index to the value of the pixels to be Ăgnored.
Yes, the loss calculation is usually not part of the model inference pipeline and this ignore_index wonât be used. How did you narrow down the invalid values were caused by these pixel locations and why do you include them at all if you want to ignore them always? In case you donât want to use them at all you could manually mask their values in your eval run.
My dataset is a 40x40x40 set of voxels, with only some of them unoccupied. I am training U-net only for the occupied voxels for two classes. But when the model is in the prediction phase, the unoccupied voxels are also being predicted, but with nan values, which causes the corresponding loss to be nan too. I want to ignore these voxels in the prediction phase to obtain a numeric loss.
The only solution I could find was to use torch.nan_to_num(). What do you suggest to ignore the non-trained unrequired voxels?
It seems you do want to calculate the loss during the validation, in which case ignore_index would still work. My comment explained the common inference use case where no loss calculation is done and only the predictions are used. However, if you are still using nn.CrossEntropyLoss to calculate the loss (and thus have targets) you could still use ignore_index in its calculation.
The loss values for class index 2 should be zero, but if your logits contain the invalid value (e.g. Inf) the F.log_softmax operation would still create NaNs for all other classes.
I donât know why and how it was working before, but you might need to apply F.log_softmax manually, making sure the other values wonât be poisoned, and then apply nn.NLLLoss with ignore_index.
However, Iâm unsure how you want to solve the issue for F.log_softmax as it needs to normalize the log probabilities.
The answer for âI donât know why and how it was working beforeâ: my dataset has a set of 7 features. When training the dataset (32,4,40,40,40) by introducing any feature from 3-7 in the channels, the loss for only testing is nan. However, this is not true when I train the Unet using other features from 0-2. The loss function always has ignore_index = 2, but somehow it doesnât work with a particular set of features. Instead of predicting a 0 for voxels with 2 target values, it predicts nan values for them in this case.
I will apply the loss manually along with F.log_softmax and see how that works.